santa hat
deerdeer nosedeer glow
Sign In

Offline LORA training on AMD GPU, ROCm 6.2 and Ubuntu 24.02 "Noble Numbat" - UPDATED AUGUST 2024

Offline LORA training on AMD GPU, ROCm 6.2 and Ubuntu 24.02 "Noble Numbat" - UPDATED AUGUST 2024

UPDATED (31/08/24):


Welcome to my streamlined and updated guide on training LoRAs using AMD GPUs. This guide is specifically tailored for environments running ROCm 6.2 on the latest Ubuntu 24.02 "Noble Numbat" release, incorporating the latest PyTorch libraries.

WHAT'S NEW:

  • ROCm 6.2 Integration: Full support for the latest ROCm drivers, ensuring optimal performance on AMD GPUs.

  • Updated for Ubuntu 24.02: Tailored installation instructions for the newest Ubuntu "Noble Numbat" release.

  • Latest PyTorch Compatibility: Fully compatible with the most recent PyTorch version for seamless training experiences.

SETTING UP THE TRAINING ENVIRONMENT:

I wont lie, this can be a ball ache, but once done, it works and works well for the most part, even on my older 6900XT I can train a high quality lora in around 40 minutes or so (8 Epochs, 70 Images, probably overkill for an anime model using the CAME optimizer), realistic or otherwise.

Installing Derrian's LoRA Easy Training Scripts (Dev Branch)

To set up Derrian's LoRA Easy Training Scripts, especially for AMD GPUs, follow these steps. While the stable branch might work with the latest updates, we’ll use the dev branch as it's a proven method. You can experiment with the recently updated stable branch if you'd like.

Step 1: Clone the Repository (Dev Branch)

First, clone the dev branch of Derrian's LoRA Easy Training Scripts by typing the following command in your terminal:

git clone -b dev https://github.com/derrian-distro/LoRA_Easy_Training_Scripts.git

This command downloads the necessary files from the repository into your local machine.

Step 2: Navigate to the Project Directory

After cloning the repository, switch into the newly created directory:

cd LoRA_Easy_Training_Scripts

Step 3: Initialize and Update Git Submodules

Within the project directory, you need to initialize and update the submodules:

git submodule init 
git submodule update

These commands ensure that all the necessary submodules, which are dependencies for the main project, are correctly set up.

Step 4: Manually Clone the Kohya_ss Scripts

For some reason, the Kohya scripts may not pull through, so will need to be installed manually

cd backend
git clone https://github.com/kohya-ss/sd-scripts.git

After cloning the Kohya_ss scripts, you'll need to perform a bit of cleanup:

  1. Delete the empty sd_scripts directory from the backend folder that was initially created.

  2. Rename the newly cloned sd-scripts folder to match the just-deleted sd_scripts directory name.

This step ensures that the project directory structure is correct and that the scripts can run without issues.

Setting Up Python 3.10.14 on Ubuntu 24.02 with Pyenv

Ubuntu 24.02 "Noble Numbat" comes with Python 3.12.3 by default. However, for running Derrian's LoRA Easy Training Scripts, you need to use Python 3.10.x. As of August 2024, the most recent version of Python 3.10 is Python 3.10.14, released on March 19, 2024. This version primarily focuses on security updates and is in the "security fixes only" phase.

To set up Python 3.10.14 on your system using pyenv, follow these steps:

Step 1: Install Pyenv

Before installing pyenv, you need to install several dependencies that are required for building Python versions:

sudo apt update 
sudo apt install -y \ make build-essential libssl-dev zlib1g-dev \ libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \ libncurses5-dev libncursesw5-dev xz-utils tk-dev \ libffi-dev liblzma-dev python-openssl git

pyenv is now ready to be installed on your system, install it by running the following command:

curl https://pyenv.run | bash

After installing pyenv, add the following lines to your shell profile (~/.bashrc, ~/.zshrc, etc.):

export PATH="$HOME/.pyenv/bin:$PATH" 
eval "$(pyenv init --path)" 
eval "$(pyenv init -)" 
eval "$(pyenv virtualenv-init -)"

Reload your shell configuration:

source ~/.bashrc # or source ~/.zshrc

Step 2: Install Python 3.10.14

With pyenv installed, you can now install Python 3.10.14 to run the training scripts:

pyenv install 3.10.14

you may get errors for missing dependencies, but you should be able to safely ignore them as they are not required by the GUI or training scripts.

Step 3: Set the Local Directory to Use Python 3.10.14

Navigate to the LoRA_Easy_Training_Scripts project directory where you want to use Python 3.10.14, and set it as the local Python version:

cd /path to LoRA_Easy_Training_Scripts
pyenv local 3.10.14

Make sure that pyenv is properly configured in your shell. You can verify this by checking if the pyenv command is recognized:

pyenv --version

If this returns version number 3.10.14, then pyenv is active and functioning as expected.

Step 4: Create a Virtual Environment in the sd_scripts Directory

After setting the correct Python version, navigate to the sd_scripts folder within your project:

cd LoRA_Easy_Training_Scripts/backend/sd_scripts

Create a virtual environment inside the sd_scripts folder using Python 3.10:

python3.10 -m venv venv

This command creates a virtual environment in the sd_scripts directory.

Activate the virtual environment with the following command:

source venv/bin/activate

The environment must be created within the sd_scripts folder. Once created, you can activate the environment from any directory by adjusting the source path, which will be relevant in later steps. Following these steps ensures that your environment is correctly set up to run Derrian's LoRA Easy Training Scripts using Python 3.10.14 on Ubuntu 24.02 "Noble Numbat".

Setting Up ROCm and PyTorch for LoRA Easy Training Scripts

To ensure optimal performance and compatibility with LoRA Easy Training Scripts, follow these steps. This guide assumes that you're using ROCm drivers version 6.1 or higher. ROCm 6.2 is recommended, though ROCm 6.1 is sufficient since it includes the necessary bits and bytes functionality.

Step 1: Ensure ROCm Drivers Are Up-to-Date

Make sure your ROCm drivers are updated to at least version 6.1. ROCm 6.2 is preferred as it includes additional enhancements, but for the context of this tutorial and training process, version 6.1 is the minimum requirement.

  • ROCm 6.1 and 6.2: Both versions support the latest stable PyTorch and include the required bits and bytes functionality.

Step 2: Activate the Virtual Environment

Navigate to the sd_scripts directory where your virtual environment is located, and activate it. Activating from this directory ensures that the pip requirements file is correctly located.

cd LoRA_Easy_Training_Scripts/backend/sd_scripts
source venv/bin/activate

Step 3: Install the Latest Stable PyTorch with ROCm Support

With the virtual environment activated, install the latest stable version of PyTorch that supports ROCm 6.1:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1

This command installs PyTorch, torchvision, and torchaudio with support for ROCm 6.1 or higher.

Step 4: Install Required Dependencies for Kohya_ss Scripts

Next, install the necessary dependencies for the Kohya_ss scripts by running:

pip install -r requirements.txt

This installs all the required packages listed in the requirements.txt file, ensuring the scripts can run correctly.

Step 5: Uninstall Unnecessary Packages

To avoid conflicts and unnecessary packages, uninstall xformers (if found) and bitsandbytes, as these are not required when using ROCm 6.1 or higher:

pip uninstall xformers bitsandbytes

This step ensures that only the necessary packages are installed, which is crucial for avoiding compatibility issues. If bitsandbytes is installed within the venv, the scripts will not work.

Step 6: Install Backend Requirements

Finally, install the backend requirements. This command must be run from within the sd_scripts directory where the virtual environment has been activated, asit includes a relative path (..) to access the correct requirements.txt:

pip install -r ../requirements.txt

This command installs additional dependencies needed for the backend scripts, ensuring that all components of your training setup are properly configured.

Finalizing the Setup with CAME Optimizer

This section will walk you through the final steps of setting up the LoRA Easy Training Scripts, including the installation of the CAME optimizer and schedulers, adjusting the environment activation script, and making necessary edits to the optimizer's initialization function.

Step 1: Install the CAME Optimizer and Schedulers

To install the CAME optimizer and schedulers, navigate to the sd_scripts directory where your virtual environment is activated, and run the following command:

pip install ../custom_scheduler/.

This command installs the CAME optimizer and related schedulers from the custom_scheduler directory.

Step 2: Install Frontend GUI Requirements

Next, navigate to the main LoRA_Easy_Training_Scripts directory and install the necessary requirements for the frontend GUI:

  1. Navigate to the main directory:

    cd LoRA_Easy_Training_Scripts
  2. Install the frontend requirements:

    pip install -r requirements.txt

Step 3: Adjust the run.sh Script and main.py

Before running the GUI, you need to make the run.sh script executable and update it to activate the correct virtual environment:

  1. Make the script executable:

    chmod +x run.sh
  2. Edit run.sh:

    Open the run.sh file in a text editor and locate the line source venv/bin/activate and replace it with source backend/sd_scripts/venv/bin/activateThis ensures that the correct virtual environment is activated when you run the script.

  3. Open the main.py file in any editor, and adjust it to the following (just select all, then copy paste), to ensure that the back end and the GUI will be able to communicate correctly:

    import subprocess
    import time
    from pathlib import Path
    import sys
    import json
    from PySide6 import QtWidgets
    from qt_material import apply_stylesheet
    import requests
    from main_ui_files.MainWindow import MainWindow
    
    def run_backend():
        command = "./backend/sd_scripts/venv/bin/python ./backend/main.py backend"
        print(f"Running command: {command}")
    
        # Run the command asynchronously
        try:
            process = subprocess.Popen(command, shell=True)
        except subprocess.CalledProcessError as e:
            print(f"Failed to start the backend: {e}")
    
        # Wait for 5 seconds to ensure the backend starts
        time.sleep(5)
        return process  # Return the process object in case you need to interact with it later
    
    def CreateConfig():
        return {
            "theme": {
                "location": Path("css/themes/dark_teal.xml").as_posix(),
                "is_light": False,
            }
        }
    
    def main() -> None:
        # Start the backend asynchronously before initializing the GUI
        backend_process = run_backend()
    
        queue_store = Path("queue_store")
        if not queue_store.exists():
            queue_store.mkdir()
        config = Path("config.json")
        config_dict = json.loads(config.read_text()) if config.exists() else CreateConfig()
        if "theme" not in config_dict:
            config_dict.update(CreateConfig())
        config.write_text(json.dumps(config_dict, indent=2))
    
        app = QtWidgets.QApplication(sys.argv)
        if config_dict["theme"]["location"]:
            apply_stylesheet(
                app,
                theme=config_dict["theme"]["location"],
                invert_secondary=config_dict["theme"]["is_light"],
            )
        window = MainWindow(app)
        window.setWindowTitle("LoRA Trainer")
        window.show()
        app.exec()
    
        config_dict = json.loads(config.read_text())
        if not config_dict.get("run_local"):
            return
        if window.main_widget.training_thread:
            while window.main_widget.training_thread.is_alive():
                time.sleep(5.0)
        requests.get(f"{window.main_widget.backend_url_input.text()}/stop_server")
    
        # Optionally, you can terminate the backend process when the GUI is closed
        if backend_process is not None:
            backend_process.terminate()
    
    if __name__ == "__main__":
        main()

Step 4: Edit the CAME Optimizer Initialization

If you plan to use the CAME optimizer, which is recommended, you'll need to modify its initialization function to properly initialize the step counting variable.

  1. Navigate to the CAME optimizer file: The file you need to edit is likely located in the virtual environment's site-packages directory:

    cd /path/to/your/LoRA_Easy_Training_Scripts/backend/sd_scripts/venv/lib/python3.10/site-packages/LoraEasyCustomOptimizer
  2. Edit came.py: Open the came.py file and locate the __init__ function. Add the following line to initialize the _step_count attribute:

     def __init__(
            self,
            params: PARAMETERS,
            lr: float = 2e-4,
            betas: BETAS = (0.9, 0.999, 0.9999),
            weight_decay: float = 0.0,
            weight_decouple: bool = True,
            fixed_decay: bool = False,
            clip_threshold: float = 1.0,
            ams_bound: bool = False,
            eps1: float = 1e-30,
            eps2: float = 1e-16,
        ):
            # Set the _step_count attribute during initialisation
            # by adding this line:
            self._step_count = 0
            self.validate_learning_rate(lr)
            self.validate_betas(betas)
            self.validate_non_negative(weight_decay, "weight_decay")
            self.validate_non_negative(eps1, "eps1")
            self.validate_non_negative(eps2, "eps2")
    
            self.clip_threshold = clip_threshold
            self.eps1 = eps1
            self.eps2 = eps2

Final Steps

Once all the steps above are completed, your training GUI should work seamlessly with the CAME optimizer. If you encounter any issues with the TOML file, consider adjusting the dataset directory as a potential workaround. Keep an eye out for updates that may offer more concrete solutions to this issue. I am currently further exploring this issue.

___still updating using the gui with the scripts, but is pretty straightforward, if you got this far I'm sure you will manage :)

Training Settings:

People like to say when it comes to settings, everything is different, its trial and error etc. I disagree. I will update for realistic models but for Pony/Anime/styles and concept models I have used exclusively the same settings with success, the VAE can be downloaded on CIVITAI if you don't already have it separately ...These should work fine on any new AMD card with at least 12gb Memory:

This section is still under development, however, most people coming here may be looking for training settings that go with the optimiser so here are mine. Just copy the box, then put in a text file and call it *.toml, load it in the gui and adjust the filenames accordingly (These settings are still current as of 31/08/2024):

[[subsets]]
caption_extension = ".txt"
image_dir = "/image_dir"
keep_tokens = 1
name = "dataset"
num_repeats = 3
shuffle_caption = true

[train_mode]
train_mode = "lora"

[general_args.args]
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
pretrained_model_name_or_path = "/PonyDiffusionV6XL.safetensors"
sdxl = true
no_half_vae = true
mixed_precision = "fp16" #bf16 may be faster on newer cards, I see little difference in output quality however
gradient_checkpointing = true
gradient_accumulation_steps = 1
seed = 119
max_token_length = 225
prior_loss_weight = 1.0
sdpa = true
max_train_epochs = 8
cache_latents = true
vae = "/sdxlVAE.safetensors"

[general_args.dataset_args]
resolution = 1024
batch_size = 2

#dim and alpha can be put up to 16/8 for realistic or more complex models
[network_args.args]
network_dropout = 0.1
network_dim = 8
network_alpha = 4.0
min_timestep = 0
max_timestep = 1000

[optimizer_args.args]
lr_scheduler = "cosine"
optimizer_type = "Came"
lr_scheduler_type = "LoraEasyCustomOptimizer.CustomOptimizers.Rex"
loss_type = "l2"
learning_rate = 0.0001
warmup_ratio = 0.05
unet_lr = 0.0001
text_encoder_lr = 1e-6
max_grad_norm = 1.0
min_snr_gamma = 5 #change to 0 for realistic models

[saving_args.args]
save_precision = "fp16"
save_model_as = "safetensors"
save_every_n_epochs = 1
save_last_n_epochs = 3
output_dir = "/stable-diffusion-webui-forge/models/Lora/"
output_name = "lora_name_(no_extension)"

[noise_args.args]
noise_offset = 0.0357
multires_noise_iterations = 5
multires_noise_discount = 0.25

[bucket_args.dataset_args]
enable_bucket = true
min_bucket_reso = 320
max_bucket_reso = 2048
bucket_reso_steps = 64

[network_args.args.network_args]

[optimizer_args.args.lr_scheduler_args]
min_lr = 1e-6

[optimizer_args.args.optimizer_args]
weight_decay = "0.02"

GETTING STARTED (A WORK IN PROGRESS)

THE TRAINING SET

Will update this as am currently using a script i wrote that automatically crops and grabs matching images from downloaded video files, however the script needs a little more work as sometimes the tagging, grabbing and sorting is still touchy, but soon i should have a script that literally grabs all the different characters and crops them from a group of videos as input, then sorts them by getting the most different ones automatically, it also tags the images to a set threshold and tag count in the same manner as the civitai trainer, except it allows for simple tag editing and redundancy, for example, all tags related to a shirt, can be adjusted to red shirt, so t-shirt, shirt, dress shirt, red shirt, etc. will all be consolidated into a single tag. It will have options to match costumes as well, however, it is all script based as I can't be bothered writing a gui for it, if anyone wants to please feel free, I will share when its done. I use it already for my Loras which is how I created all my recent ones that have little to no fan art or art on various booru sites, but it still needs a little manual input and is currently far from idiot proof.

13

Comments