Hi !

TLDR

Some python scripts.
Save as script.py
Change the path to your images directory in the script
Run "python script.py" in CMD (so that if it crashes the window doesn't close and you can see what happened)

Presentation

If you're anything like me and generate a lot of images, you likely have a huge directory full of stuff that you'll want to browse and sort "some day". I pretty much play a lot with LoRAs, dynamic prompts, wildcards and checkpoints, trying to see what's working and what's not by generating a damn lot of images using X/Y/Z plots (to the point I had to disable the resulting grid from generating at the end cause it froze my entire PC for 2 minutes when it did), generally with a bunch of checkpoints on one axis and some other setting on the other, with batch count 12.

My usual prompt is something like this, only even more complicated :

__character__, __pose__ {, __settings__|} {2:: , __style_cartoon__|, __style_realistic__|} {, __random_modifier__|} {, from {side|front|above|behind}|} {<lora:someSecondaryStyleLora:{0.1|0.2|0.3}>|best quality|}

and in character.txt or pose.txt, I usually have stuff like

<lora:someLora:{0.1|0.3|0.5|0.6|0.7}> loraTriggerWord, loraOtherWords {, optional_lora_word1|} {, optional_lora_word1|}
__some_pop_culture_character__
woman, {freckles,|} __eyecolor__ eyes, __hair-color__ __hair-female__
{1girl|1boy} {__jobs__ clothes|__top_clothes__, __pants__ {, __hat__|}}

As you can see, i try to make the most of the DynamicPrompt extensionto get exciting and unexpected results. Then I do stuff and 2/3h later I have 300-600 images to sort. Repeat the process a few times and I have a LOT of images that I'll do something with maybe some day.

But I thought that I could write some scripts to help me refine my prompts and/or wildcards. After I did, I thought it might be useful for other people than me, so here they are :

Please keep in mind that I'm new to Python, so these scripts are really "Quick'n'Dirty". I mainly wrote them for myself in the first place. They'll do the job but they are highly unoptimized. Since they're not supposed to run on a regular basis but rather one time or two, I probably won't bother cleaning/refactoring/optimizing the code

Scripts

For all the scripts here :
- You HAVE TO change the 'input_dir" variable in the script and put the path to your images folder/directory
- These scripts are recursive, they'll run in the specified folder AND ALL subfolders
- you'll see some weird stuff. Stuff like .replace('\\u0000', '') or .replace('\\n', '\n')... I just had some cases where the metadata was kind of distorted or badly encoded. Some images for examples show "Prompt not found" when put into the "PNG Info" of the webui and I didn't want the script to crash, so.... Quick'n'Dirty, told ya, didn't have time to investigate try/catch blocks in Python yet
- Uncomment the 'prints' for debug if the script crashes
- I can try and debug some errors you might encounter with these scripts, but I'll be useless with errors linked to Python and/or its modules itself
- Some might not work if you use line breaks in your prompts

Extract all prompts from all images from a folder (and subfolders) into a (maybe huge) .txt file

This will extract the prompts of all the images into a single .txt file (to perform searches, or create new wildcards, or I don't know, don't hesitate to suggest usages or other scripts that would be useful).
Depending on the amount of images you have (about 45k in my case), this .txt file can be quite big. I suggest you use anything but the default Windows Notepad to open it. (I use VSCode personally)

from PIL import Image
import os
import json

input_dir = "E:/path/to/your/images"
output_file = "output.txt"
temp = []

for root, subdirs, files in os.walk(input_dir):
    for filename in files:
        if filename.endswith(".png"):
            file_path = os.path.join(root, filename)

            print(file_path)
            temp.append(file_path)
    

with open(output_file, "w", encoding='utf-8') as f:
    # Iterate over each file in the input directory
    for file_path in temp:
        # Open the image file
        image = Image.open(file_path)
        
        # Extract the prompt
        # this will extract only the positive prompt, remove .split('\n')[0] at the end for the whole metadata (positive prompt, negative, and parameters)
        prompt = json.dumps(image.info["parameters"], ensure_ascii=False) [1:-1].replace('\\u0000', '').strip().replace('\\n', '\n').split('\n')[0]
        # print(prompt.encode('utf-8'))
        
        # comment this line not to write the file path to the .txt file
        f.write(f"{file_path}\n") 

        f.write(f"{prompt}\n")
        f.write("\n")

        # Close the image file
        image.close()

List and Count the most used Checkpoints (to finally get back some disk space)

from PIL import Image
from collections import Counter
import os
import json

input_dir = "F:/path/to/images"
output_file = "ckpts.txt"
temp = []
ckpts = []

for root, subdirs, files in os.walk(input_dir):
    for filename in files:
        if filename.endswith(".png"):
            file_path = os.path.join(root, filename)

            print(file_path)
            temp.append(file_path)
    

with open(output_file, "w", encoding='utf-8') as f:
    for file_path in temp:
        image = Image.open(file_path)
        
        metadata = json.dumps(image.info["parameters"], ensure_ascii=False) [1:-1].replace('\\u0000', '').strip().replace('\\n', '\n').split(', Model:')[1].split(', ')[0]
        # print(f"{file_path} Done !")
        # print(metadata.encode('utf-8'))
        ckpts.append(metadata)
        
        image.close()
    
    # print(Counter(ckpts).most_common())

    for value, count in Counter(ckpts).most_common():
        f.write(f"{value}, {count}\n")

Result sample :

The number is the number of images in the folder and subfolders generated with that checkpoint

 jokelessPONYMODEL_v10, 447
 gehennaFusionSDXLPony_v15, 405
 ponyDiffusionV6XL_v6StartWithThisOne, 390
 theMagicSauce_v10, 377
 CheckpointFlat2d_v1, 353
 ...

Copy all files that have some specific word/lora/whatever in the prompt to another folder

I know, it's beginning to get ugly... No need to open that .txt file but I copy/pasted from the other scripts....
Replace the your_word variable by what you're looking for, obviously. Here, for reasons I can't remember, I'm copying all the images that have "middle-finger" in their prompt to a folder named "send_to_boss"

from PIL import Image
from pathlib import Path
import os
import json
import shutil


input_dir = "F:/path/to/images"
output_dir = input_dir + "/send_to_boss"
output_file = output_dir + "/output.txt"
your_word = 'middle-finger'

temp = []

Path(output_dir).mkdir(parents=True, exist_ok=True)

for root, subdirs, files in os.walk(input_dir):
    for filename in files:
        if filename.endswith(".png"):
            file_path = os.path.join(root, filename)

            #print(file_path)
            temp.append(file_path)
    
print('\n--\n')
with open(output_file, "w", encoding='utf-8') as f:
    for file_path in temp:
        image = Image.open(file_path)
        
        metadata = json.dumps(image.info["parameters"], ensure_ascii=False) [1:-1].replace('\\u0000', '').strip().replace('\\n', '\n')
        if your_word.lower() in metadata.lower():
            shutil.copy(file_path, output_dir)
            print(f"{file_path} Copied !")

Variants :

Good stuff with this piece of code is that you can filter and copy your images by pretty much anything you want, by changing a few stuff.
Examples :

# copy all images with a specific seed
metadata = json.dumps(image.info["parameters"], ensure_ascii=False) [1:-1].replace('\\u0000', '').strip().replace('\\n', '\n').split(', Seed: ')[1].split(', ')[0]
if "3787865969".lower() in metadata.lower():

# copy all images with a specific lora
metadata = json.dumps(image.info["parameters"], ensure_ascii=False) [1:-1].replace('\\u0000', '').strip().replace('\\n', '\n')
if "<lora:nemimontoya".lower() in metadata.lower():

# copy all images generated by a specific checkpoint
metadata = json.dumps(image.info["parameters"], ensure_ascii=False) [1:-1].replace('\\u0000', '').strip().replace('\\n', '\n').split(', Model: ')[1].split(', ')[0]
if "ponyDiffusionV6XL_v6StartWithThisOne".lower() in metadata.lower():

# copy all images generated with a specific sampler
metadata = json.dumps(image.info["parameters"], ensure_ascii=False) [1:-1].replace('\\u0000', '').strip().replace('\\n', '\n').split(', Sampler: ')[1].split(', ')[0]
if "Euler a".lower() in metadata.lower():

and so on....

List and Count most used LoRAs and their respective most-used weights

The one I'm the most proud, even if, again, the code is very dirty. Can be resource-intensive I guess.
Finally I can see quickly, in my "best images" folder, what LoRAs are most used and at what weight, so I can get rid of the ones not working and refine the weights in my wildcard files.

from PIL import Image
from pathlib import Path
from collections import Counter
import os
import json
import re


input_dir = "F:/path/to/images"
output_file = "loras.txt"
temp = []
full_loras = {}
loras = []

for root, subdirs, files in os.walk(input_dir):
    for filename in files:
        if filename.endswith(".png") and not "Copie" in filename:
            file_path = os.path.join(root, filename)

            print(file_path)
            temp.append(file_path)
    
print('\n--\n')
with open(output_file, "w", encoding='utf-8') as f:
    for file_path in temp:
        print(f'{file_path}')
        image = Image.open(file_path)
        
        metadata = json.dumps(image.info["parameters"], ensure_ascii=False) [1:-1].replace('\\u0000', '').strip().replace('\\n', '\n').split('\n')[0]
        if "<lora:".lower() in metadata.lower():
            lors = re.findall(r'<.*?>', str(metadata.encode("utf-8")))
            for lor in lors:
                if "<lora:".lower() in lor:
                    l = lor.split('<lora:')[1].split(':')[0]
                if "<lyco:".lower() in lor:
                    l = lor.split('lyco:')[1].split(':')[0]
                # for some reason I had one of these
                if "<hypernet:".lower() in lor:
                    l = lor.split('hypernet:')[1].split(':')[0]
                    
                # uncomment for debug if line 43, in <module> weight = lor.split(f'{l}:')[1].split('>')[0] IndexError: list index out of range error
                # print(f'{l}')
                weight = lor.split(f'{l}:')[1].split('>')[0]

                loras.append(str(l))
                if not l in full_loras:
                    full_loras[l] = []
                if not '|' in weight:
                    full_loras[l].append(weight)
        image.close()
    
    for value, count in Counter(loras).most_common():
        # print(f"{value}, {count}")
        f.write(f"{value}, {count} imgs\n")
        for v, c in Counter(full_loras[value]).most_common():
            f.write(f"\t{v} - {c}\n")
        f.write("\n")

Results sample :

g0th1cPXL, 362 imgs
	0.4 - 54
	0.5 - 50
	0.3 - 47
	0.9 - 44
	0.6 - 39
	0.8 - 32
	0.7 - 30
	1.0 - 27
	0.2 - 20
	1.2 - 3
	1.1 - 2
	0.55 - 1
	0.65 - 1
	0.72 - 1
	0.45 - 1

Mercy_Overwatch_Ultimate_Pack_PonyXL_LoRA, 92 imgs
	1 - 48
	0.6 - 11
	0.7 - 8
	0.8 - 7
	0.9 - 6
	1.2 - 3
	1.0 - 3
	0.3 - 3
	0.5 - 2
	0.4 - 1

Re-lMayer_PXL, 91 imgs
	0.6 - 18
	0.8 - 16
	1.0 - 14
	0.5 - 13
	1.1 - 12
	0.9 - 9
	0.7 - 9

MedievalBarMaid-medbarmaid-Pony-v1, 75 imgs
	0.6 - 15
	0.9 - 14
	0.7 - 11
	0.8 - 8
	1.1 - 6
	0.3 - 6
	0.5 - 4
	1.0 - 4
	0.4 - 4
	1.2 - 3

And that's it for the moment. I hope this can be useful for anyone. Don't hesitate to suggest other stuff that might by possible with the images metadata.

Happy generations !

Search your image folder and get Stats - Quick'n'Dirty Python scripts for images hoarders like me