UPDATED (31/08/24):
Welcome to my streamlined and updated guide on training LoRAs using AMD GPUs. This guide is specifically tailored for environments running ROCm 6.2 on the latest Ubuntu 24.02 "Noble Numbat" release, incorporating the latest PyTorch libraries.
WHAT'S NEW:
ROCm 6.2 Integration: Full support for the latest ROCm drivers, ensuring optimal performance on AMD GPUs.
Updated for Ubuntu 24.02: Tailored installation instructions for the newest Ubuntu "Noble Numbat" release.
Latest PyTorch Compatibility: Fully compatible with the most recent PyTorch version for seamless training experiences.
SETTING UP THE TRAINING ENVIRONMENT:
I wont lie, this can be a ball ache, but once done, it works and works well for the most part, even on my older 6900XT I can train a high quality lora in around 40 minutes or so (8 Epochs, 70 Images, probably overkill for an anime model using the CAME optimizer), realistic or otherwise.
Installing Derrian's LoRA Easy Training Scripts (Dev Branch)
To set up Derrian's LoRA Easy Training Scripts, especially for AMD GPUs, follow these steps. While the stable branch might work with the latest updates, we’ll use the dev branch as it's a proven method. You can experiment with the recently updated stable branch if you'd like.
Step 1: Clone the Repository (Dev Branch)
First, clone the dev branch of Derrian's LoRA Easy Training Scripts by typing the following command in your terminal:
git clone -b dev https://github.com/derrian-distro/LoRA_Easy_Training_Scripts.git
This command downloads the necessary files from the repository into your local machine.
Step 2: Navigate to the Project Directory
After cloning the repository, switch into the newly created directory:
cd LoRA_Easy_Training_Scripts
Step 3: Initialize and Update Git Submodules
Within the project directory, you need to initialize and update the submodules:
git submodule init
git submodule update
These commands ensure that all the necessary submodules, which are dependencies for the main project, are correctly set up.
Step 4: Manually Clone the Kohya_ss Scripts
For some reason, the Kohya scripts may not pull through, so will need to be installed manually
cd backend
git clone https://github.com/kohya-ss/sd-scripts.git
After cloning the Kohya_ss scripts, you'll need to perform a bit of cleanup:
Delete the empty
sd_scripts
directory from the backend folder that was initially created.Rename the newly cloned
sd-scripts
folder to match the just-deletedsd_scripts
directory name.
This step ensures that the project directory structure is correct and that the scripts can run without issues.
Setting Up Python 3.10.14 on Ubuntu 24.02 with Pyenv
Ubuntu 24.02 "Noble Numbat" comes with Python 3.12.3 by default. However, for running Derrian's LoRA Easy Training Scripts, you need to use Python 3.10.x. As of August 2024, the most recent version of Python 3.10 is Python 3.10.14, released on March 19, 2024. This version primarily focuses on security updates and is in the "security fixes only" phase.
To set up Python 3.10.14 on your system using pyenv
, follow these steps:
Step 1: Install Pyenv
Before installing pyenv
, you need to install several dependencies that are required for building Python versions:
sudo apt update
sudo apt install -y \ make build-essential libssl-dev zlib1g-dev \ libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \ libncurses5-dev libncursesw5-dev xz-utils tk-dev \ libffi-dev liblzma-dev python-openssl git
pyenv
is now ready to be installed on your system, install it by running the following command:
curl https://pyenv.run | bash
After installing pyenv
, add the following lines to your shell profile (~/.bashrc
, ~/.zshrc
, etc.):
export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
Reload your shell configuration:
source ~/.bashrc # or source ~/.zshrc
Step 2: Install Python 3.10.14
With pyenv
installed, you can now install Python 3.10.14 to run the training scripts:
pyenv install 3.10.14
you may get errors for missing dependencies, but you should be able to safely ignore them as they are not required by the GUI or training scripts.
Step 3: Set the Local Directory to Use Python 3.10.14
Navigate to the LoRA_Easy_Training_Scripts project directory where you want to use Python 3.10.14, and set it as the local Python version:
cd /path to LoRA_Easy_Training_Scripts
pyenv local 3.10.14
Make sure that pyenv
is properly configured in your shell. You can verify this by checking if the pyenv
command is recognized:
pyenv --version
If this returns version number 3.10.14, then pyenv
is active and functioning as expected.
Step 4: Create a Virtual Environment in the sd_scripts
Directory
After setting the correct Python version, navigate to the sd_scripts
folder within your project:
cd LoRA_Easy_Training_Scripts/backend/sd_scripts
Create a virtual environment inside the sd_scripts
folder using Python 3.10:
python3.10 -m venv venv
This command creates a virtual environment in the sd_scripts
directory.
Activate the virtual environment with the following command:
source venv/bin/activate
The environment must be created within the sd_scripts
folder. Once created, you can activate the environment from any directory by adjusting the source path, which will be relevant in later steps. Following these steps ensures that your environment is correctly set up to run Derrian's LoRA Easy Training Scripts using Python 3.10.14 on Ubuntu 24.02 "Noble Numbat".
Setting Up ROCm and PyTorch for LoRA Easy Training Scripts
To ensure optimal performance and compatibility with LoRA Easy Training Scripts, follow these steps. This guide assumes that you're using ROCm drivers version 6.1 or higher. ROCm 6.2 is recommended, though ROCm 6.1 is sufficient since it includes the necessary bits and bytes functionality.
Step 1: Ensure ROCm Drivers Are Up-to-Date
Make sure your ROCm drivers are updated to at least version 6.1. ROCm 6.2 is preferred as it includes additional enhancements, but for the context of this tutorial and training process, version 6.1 is the minimum requirement.
ROCm 6.1 and 6.2: Both versions support the latest stable PyTorch and include the required bits and bytes functionality.
Step 2: Activate the Virtual Environment
Navigate to the sd_scripts
directory where your virtual environment is located, and activate it. Activating from this directory ensures that the pip
requirements file is correctly located.
cd LoRA_Easy_Training_Scripts/backend/sd_scripts
source venv/bin/activate
Step 3: Install the Latest Stable PyTorch with ROCm Support
With the virtual environment activated, install the latest stable version of PyTorch that supports ROCm 6.1:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
This command installs PyTorch, torchvision, and torchaudio with support for ROCm 6.1 or higher.
Step 4: Install Required Dependencies for Kohya_ss Scripts
Next, install the necessary dependencies for the Kohya_ss scripts by running:
pip install -r requirements.txt
This installs all the required packages listed in the requirements.txt
file, ensuring the scripts can run correctly.
Step 5: Uninstall Unnecessary Packages
To avoid conflicts and unnecessary packages, uninstall xformers
(if found) and bitsandbytes
, as these are not required when using ROCm 6.1 or higher:
pip uninstall xformers bitsandbytes
This step ensures that only the necessary packages are installed, which is crucial for avoiding compatibility issues. If bitsandbytes is installed within the venv, the scripts will not work.
Step 6: Install Backend Requirements
Finally, install the backend requirements. This command must be run from within the sd_scripts
directory where the virtual environment has been activated, asit includes a relative path (..
) to access the correct requirements.txt
:
pip install -r ../requirements.txt
This command installs additional dependencies needed for the backend scripts, ensuring that all components of your training setup are properly configured.
Finalizing the Setup with CAME Optimizer
This section will walk you through the final steps of setting up the LoRA Easy Training Scripts, including the installation of the CAME optimizer and schedulers, adjusting the environment activation script, and making necessary edits to the optimizer's initialization function.
Step 1: Install the CAME Optimizer and Schedulers
To install the CAME optimizer and schedulers, navigate to the sd_scripts
directory where your virtual environment is activated, and run the following command:
pip install ../custom_scheduler/.
This command installs the CAME optimizer and related schedulers from the custom_scheduler
directory.
Step 2: Install Frontend GUI Requirements
Next, navigate to the main LoRA_Easy_Training_Scripts
directory and install the necessary requirements for the frontend GUI:
Navigate to the main directory:
cd LoRA_Easy_Training_Scripts
Install the frontend requirements:
pip install -r requirements.txt
Step 3: Adjust the run.sh
Script and main.py
Before running the GUI, you need to make the run.sh
script executable and update it to activate the correct virtual environment:
Make the script executable:
chmod +x run.sh
Edit
run.sh
:Open the
run.sh
file in a text editor and locate the linesource venv/bin/activate
and replace it withsource backend/sd_scripts/venv/bin/activate
This ensures that the correct virtual environment is activated when you run the script.Open the
main.py
file in any editor, and adjust it to the following (just select all, then copy paste), to ensure that the back end and the GUI will be able to communicate correctly:import subprocess import time from pathlib import Path import sys import json from PySide6 import QtWidgets from qt_material import apply_stylesheet import requests from main_ui_files.MainWindow import MainWindow def run_backend(): command = "./backend/sd_scripts/venv/bin/python ./backend/main.py backend" print(f"Running command: {command}") # Run the command asynchronously try: process = subprocess.Popen(command, shell=True) except subprocess.CalledProcessError as e: print(f"Failed to start the backend: {e}") # Wait for 5 seconds to ensure the backend starts time.sleep(5) return process # Return the process object in case you need to interact with it later def CreateConfig(): return { "theme": { "location": Path("css/themes/dark_teal.xml").as_posix(), "is_light": False, } } def main() -> None: # Start the backend asynchronously before initializing the GUI backend_process = run_backend() queue_store = Path("queue_store") if not queue_store.exists(): queue_store.mkdir() config = Path("config.json") config_dict = json.loads(config.read_text()) if config.exists() else CreateConfig() if "theme" not in config_dict: config_dict.update(CreateConfig()) config.write_text(json.dumps(config_dict, indent=2)) app = QtWidgets.QApplication(sys.argv) if config_dict["theme"]["location"]: apply_stylesheet( app, theme=config_dict["theme"]["location"], invert_secondary=config_dict["theme"]["is_light"], ) window = MainWindow(app) window.setWindowTitle("LoRA Trainer") window.show() app.exec() config_dict = json.loads(config.read_text()) if not config_dict.get("run_local"): return if window.main_widget.training_thread: while window.main_widget.training_thread.is_alive(): time.sleep(5.0) requests.get(f"{window.main_widget.backend_url_input.text()}/stop_server") # Optionally, you can terminate the backend process when the GUI is closed if backend_process is not None: backend_process.terminate() if __name__ == "__main__": main()
Step 4: Edit the CAME Optimizer Initialization
If you plan to use the CAME optimizer, which is recommended, you'll need to modify its initialization function to properly initialize the step counting variable.
Navigate to the CAME optimizer file: The file you need to edit is likely located in the virtual environment's site-packages directory:
cd /path/to/your/LoRA_Easy_Training_Scripts/backend/sd_scripts/venv/lib/python3.10/site-packages/LoraEasyCustomOptimizer
Edit
came.py
: Open thecame.py
file and locate the__init__
function. Add the following line to initialize the_step_count
attribute:def __init__( self, params: PARAMETERS, lr: float = 2e-4, betas: BETAS = (0.9, 0.999, 0.9999), weight_decay: float = 0.0, weight_decouple: bool = True, fixed_decay: bool = False, clip_threshold: float = 1.0, ams_bound: bool = False, eps1: float = 1e-30, eps2: float = 1e-16, ): # Set the _step_count attribute during initialisation # by adding this line: self._step_count = 0 self.validate_learning_rate(lr) self.validate_betas(betas) self.validate_non_negative(weight_decay, "weight_decay") self.validate_non_negative(eps1, "eps1") self.validate_non_negative(eps2, "eps2") self.clip_threshold = clip_threshold self.eps1 = eps1 self.eps2 = eps2
Final Steps
Once all the steps above are completed, your training GUI should work seamlessly with the CAME optimizer. If you encounter any issues with the TOML file, consider adjusting the dataset directory as a potential workaround. Keep an eye out for updates that may offer more concrete solutions to this issue. I am currently further exploring this issue.
___still updating using the gui with the scripts, but is pretty straightforward, if you got this far I'm sure you will manage :)
Training Settings:
People like to say when it comes to settings, everything is different, its trial and error etc. I disagree. I will update for realistic models but for Pony/Anime/styles and concept models I have used exclusively the same settings with success, the VAE can be downloaded on CIVITAI if you don't already have it separately ...These should work fine on any new AMD card with at least 12gb Memory:
This section is still under development, however, most people coming here may be looking for training settings that go with the optimiser so here are mine. Just copy the box, then put in a text file and call it *.toml, load it in the gui and adjust the filenames accordingly (These settings are still current as of 31/08/2024):
[[subsets]]
caption_extension = ".txt"
image_dir = "/image_dir"
keep_tokens = 1
name = "dataset"
num_repeats = 3
shuffle_caption = true
[train_mode]
train_mode = "lora"
[general_args.args]
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
pretrained_model_name_or_path = "/PonyDiffusionV6XL.safetensors"
sdxl = true
no_half_vae = true
mixed_precision = "fp16" #bf16 may be faster on newer cards, I see little difference in output quality however
gradient_checkpointing = true
gradient_accumulation_steps = 1
seed = 119
max_token_length = 225
prior_loss_weight = 1.0
sdpa = true
max_train_epochs = 8
cache_latents = true
vae = "/sdxlVAE.safetensors"
[general_args.dataset_args]
resolution = 1024
batch_size = 2
#dim and alpha can be put up to 16/8 for realistic or more complex models
[network_args.args]
network_dropout = 0.1
network_dim = 8
network_alpha = 4.0
min_timestep = 0
max_timestep = 1000
[optimizer_args.args]
lr_scheduler = "cosine"
optimizer_type = "Came"
lr_scheduler_type = "LoraEasyCustomOptimizer.CustomOptimizers.Rex"
loss_type = "l2"
learning_rate = 0.0001
warmup_ratio = 0.05
unet_lr = 0.0001
text_encoder_lr = 1e-6
max_grad_norm = 1.0
min_snr_gamma = 5 #change to 0 for realistic models
[saving_args.args]
save_precision = "fp16"
save_model_as = "safetensors"
save_every_n_epochs = 1
save_last_n_epochs = 3
output_dir = "/stable-diffusion-webui-forge/models/Lora/"
output_name = "lora_name_(no_extension)"
[noise_args.args]
noise_offset = 0.0357
multires_noise_iterations = 5
multires_noise_discount = 0.25
[bucket_args.dataset_args]
enable_bucket = true
min_bucket_reso = 320
max_bucket_reso = 2048
bucket_reso_steps = 64
[network_args.args.network_args]
[optimizer_args.args.lr_scheduler_args]
min_lr = 1e-6
[optimizer_args.args.optimizer_args]
weight_decay = "0.02"
GETTING STARTED (A WORK IN PROGRESS)
THE TRAINING SET
Will update this as am currently using a script i wrote that automatically crops and grabs matching images from downloaded video files, however the script needs a little more work as sometimes the tagging, grabbing and sorting is still touchy, but soon i should have a script that literally grabs all the different characters and crops them from a group of videos as input, then sorts them by getting the most different ones automatically, it also tags the images to a set threshold and tag count in the same manner as the civitai trainer, except it allows for simple tag editing and redundancy, for example, all tags related to a shirt, can be adjusted to red shirt, so t-shirt, shirt, dress shirt, red shirt, etc. will all be consolidated into a single tag. It will have options to match costumes as well, however, it is all script based as I can't be bothered writing a gui for it, if anyone wants to please feel free, I will share when its done. I use it already for my Loras which is how I created all my recent ones that have little to no fan art or art on various booru sites, but it still needs a little manual input and is currently far from idiot proof.