How I've been creating XL Loras lately

I've been trying to use as many images as possible to train lately, as many as 1000 images which is the maximum that Civitai can take. To really capture an art style well it really needs this many images. To do this I need to get many images as I can that are reasonably large and pick from and then to use the best for training. To do this I've used a few tools to extract individual images from various sources.

ComicPanelSplitter

This tool takes in image files and detects the individual comic panels in the image and saves them all separately. It can get stuck on mostly white images so I need to remove those from the source set prior to running the command. Training needs images that are unique, if you have a bunch of comic panels on a page it will not train well for specific things and you will end up generating pages with comic panels instead of the art style within the panels. You still have the issue of speech bubbles but that can usually be taken care of using a negative prompt.

Here is a simple batch file that I used to process all the image files in a directory.

@echo off
set input_directory=A:\pages
set output_directory=A:\panels
set dos_tool_path=A:\ComicPanelsSplitter.exe

echo "%input_directory%"
echo "%output_directory%"
echo "%dos_tool_path%"

if not exist "%output_directory%" mkdir "%output_directory%"

for %%i in ("%input_directory%\*.jpg") do (
    echo Processing "%%i"
    "%dos_tool_path%" "%%i" "%output_directory%"
)

echo Batch processing complete.

pdfimages

This tool takes a pdf file as input and can output all the image files contained in the PDF. Some PDFs have the images and text separate so this is very useful to pull out just the images. You want as many high quality images as you can get. Download the Xpdf command line tools and extract and it and then you can find this tool in the following location. Be sure to use the -j option otherwise you get some rather large PDF native image file format files. Be aware that it may not work on all PDF files or may partially work.

A:> cd xpdf-tools-win-4.04\bin64\pdfimages.exe
A:\xpdf-tools-win-4.04\bin64>pdfimages -j input.pdf A:\pdfimages

magick

Sometimes when you extract the images from PDF the images will be split between pages. You can use this tool to combine the image pairs back together.

magick convert +append imput1.jpg input2.jpg output.jpg

If you want to combine images vertically you can use this command instead

magick convert -append imput1.jpg input2.jpg output.jpg

Here is a short powershell script to combine even and odd page images into combined output images.

A:\combineall.ps1 .\images\*.jpg

param (
    [Alias("FullName")]
    [String[]]$InputPattern
)

Write-Host "InputImages $InputPattern"

$inputCount = $InputPattern.Count

write-Host "inputCount $inputCount"

$OutputDirectory = "Output"
if (-not (Test-Path $OutputDirectory)) {
    New-Item -ItemType Directory -Path $OutputDirectory
}

$InputImages = Get-ChildItem -Path $InputPattern -Include *.jpg | ForEach-Object { $_.FullName }

$TotalPairs = [math]::Ceiling($InputImages.Count / 2)
Write-Host "TotalPairs $TotalPairs"

for ($i = 0; $i -lt $TotalPairs; $i++) {
    Write-Host "pair $i"

    $Input1 = $InputImages[$i * 2]
    Write-Host "Input1 $Input1"


    $Input2 = $InputImages[$i * 2 + 1]
    Write-Host "Input2 $Input2"

    $DateTime = Get-Date -Format "yyyyMMdd_HHmmss_fff"
    $Output = Join-Path $OutputDirectory "output_$DateTime.jpg"

    magick convert +append "$Input1" "$Input2" "$Output"

    Write-Host "Stitching complete for pair $Input1 and $Input2. Output saved to $Output"
}

Write-Host "All pairs stitched."

ffmpeg

This tool can be used to extract frames from a video file. Be sure to get as high a quality source video as possible. Take the total length in time of the video and divide by 1000 to figure out how often to extract a frame and then pick and choose frames from there.

ffmpeg -ss 00:30 -i input.mp4 -start_number 0 -vf fps=1/60 "B 00-%02d-00.000.png"

-ss is the starting offset so in this example it will start extracting frames after 30 seconds into the video

fps=1/60 is the inverse of how frequent to extract a frame of video, is this case it is every 60 seconds. Using fps=1/10 would be every 10 seconds.

ffmpeg can also be used to crop a film prior to capturing screenshots. First you can have it detect the cropping size and position and then crop the video to remove black borders or watermarks.

ffmpeg -i input.webm -t 1 -vf cropdetect ouput.webm 

[Parsed_cropdetect_0 @ 0000022ee0f41840] x1:3839 x2:0 y1:2159 y2:0 w:-3824 h:-2144 x:3834 y:2154 pts:701 t:0.701000 crop=-3824:-2144:3834:2154

ffmpeg -i input.webm vf "crop=3450:2160:0:0" output.webm

BulkRenameHere

Civitai likes their training files to be named in a particular way when you zip them up. This tool is useful to quickly rename a set of image files to match 000.jpg-999.jpg, just select the files and then use the context menu to bulk rename them. I use the option to remove existing file name and then add the suffix .jpg and enable the numbering with the leading zero option. It can't handle 1000 files at once so I usually rename them in two steps.

kohya_ss

Useful for adding captions to your training images however it has a bug which makes captioning not work at the moment so you need to fix it manually. See this link for a fix. I can also use this to do my own training locally but I don't have as big a VRAM as Civitai can provide so I can only do XL Lora training at 768x768 max with about 20 images before it gets painfully slow on my 12GB Nvida 3060. On Civitai I can give it 1000 1024x1024 images and run it for lots of steps and it only takes an hour or so. So contribute more buzz if you want me to do more stuff or spiff me a 4090 if you have one :-) As for the quality of the captions, I have hand crafted the captioning for at least 10% of the images in some cases but I have gotten good results without doing much captioning at all or just a generic by artists or style xxxx type of thing.

Putting these tools all together you can get a bunch of images captioned and in a zip file to easily upload to Civitai lora training service. I use the defaults for the most part but I try to up the number of repeats until it costs a bit more the 500 buzz which is the default cost of a training. Be aware that training may get stuck or restart or fail so sometime it helps to download the epochs as they are generated so you at least have something to try if it fails. They disappear if it fails or restarts. Also remember to download the training data so if you want to try again you can. Don't despair if the sample images look terrible because using the lora with a better model will likely fix that.

Hope this helps someone with doing some training.

How I've been creating XL Loras lately

Comments