Sign In

[Machine Learning] Leco score (v2)

0
Jun 15, 2024
ML Research
[Machine Learning] Leco score (v2)

(EDIT: Midjourney has now implemented personalized style, but from choice/rankings instead of giving individual scores for each image)

Leco score v1: https://civitai.com/articles/4216

Leco score v1 sample: https://civitai.com/models/317942

Leco score v2: this

Leco score v2 sample: https://civitai.com/models/471936

Leco score v2 training guide: https://civitai.com/articles/5422

New features in v2:

  • the code is cleaner, uses the standard Resnet implementation from timm

  • the training set is composed of images from AVA (photo), Danbooru (anime), Imagenet (realistic) and Wikiart (paintings), with approximately 250k images each

  • the criterion for initial training of the resnet in latent space is 1D-EMD and classification on AVA, nsfw classification and tags classification on Danbooru, style classification and year regression on wikiart, and of course classification on imagenet

  • to mix all the criteria, the obvious solution would have been to sum them, but it brings the question of scaling the different criteria to make sure they converge at the same time

  • instead, a Lion optimiser (scaleless since it takes the sign of gradient/momentum) is put on each criterion, and each optimiser is stepped independently

  • at 'convergence', progress in one validation metric makes another validation metric worse

  • this method seems to be explainable in terms of meta-learning/MAML, as this is trying to find a resnet that is equidistant (in number of necessary finetuning steps) from solving AVA classification, Danbooru tags classification, wikiart style classfication and imagenet classification

  • I can't find the paper, but information not present in a network when you start training it will be difficult to teach to the network (it's the paper where they train on half images and the missing side never catches up in training) so hopefully this will finetune easily

  • this was a pain to train, latent space has normal distribution, resnets are hard to train and meta learning was tricky, doing all 3 on my 2060 was probably not a bright idea

Rating images

  • 10k images with high rating have been downloaded from civitai

  • a very small flask website/app lets you rate those images on a scale from 1 to 5 stars (see the training guide)

Code files to do the initial training:

  • make_all_data. py takes the raw AVA/Imagenet/Wikiart/Booru files and turns them into latent tensors

  • divide_safetensors .py take those tensor files, does some additional postprocessing and divides the files into smaller chunks to fit into memory

  • train_resnet_lightning .py does the initial training

Code files for fine-tuning

  • the rate_images folder contains the flask mini-site

  • ratings_to_tensor .py also takes the data from the rating mini-site and converts it into tensors

  • finetune_resnet .py finetunes the resnet to match the given ratings (can use the tensor output above or use any file that has 'latents' and 'float score' data

Making a Lora

  • using the Leco codebase, the resnet is instantiated before the main loop, and the loss function is replaced with '-score(image.sample)' to maximise the score

  • there is a training file with PartiPrompts (the one I usually use for training Leco), I'm still waiting for Google Research to make the Gecko evaluation prompts available (a set of prompts class-balanced in terms of required skills) https://arxiv.org/abs/2404.16820

Other training methods/networks

  • it should be possible to train using the sd-script Lora loop, by replacing 'loss = MSE(predicted_noise_t_t+1, actual_noise_t_t+1)' with 'loss=-score(latents)' but you will have to reconstruct the latents from the noise and d_noise_pred. This should give some bias (score is only improved along the given images) and make things a lot faster.

  • the sliders codebase is almost identical to the Leco codebase, the code can be modified in the same manner (add a resnet instantiation, replace 'loss=prompt_loss()' by 'loss=-score(latent.sample)'

  • I haven't tried adding the following in the main diffusion loop, I'd be surprised if something so simple worked (since fine tuning the resnet takes a minute, this would mean a few-few-shot style learning, lora being few-shot), taking into account there already is research on zero-shot style copy by injecting a reference image in some parts of the unet ( https://github.com/naver-ai/Visual-Style-Prompting )

    • temp = score(x)

    • temp.backward()

    • d_x = x.grad

    • x += 0.1*dx

    • (EDIT, this actually works outside of the diffusion loop, fully diffused images can be marginally moved to a higher score image)

  • in terms of Lora training code, v1 and v2 are very similar, both require the instantiation of a pytorch module and the path to a checkpoint, the module always has a score signature (latent 4x64x64->1)

0

Comments