Visper - Visual Data PreP For Training (fantasy)

Project Goal: Develop a self-contained macOS application for advanced image processing, including background removal (with color/transparency/mask options), resizing, format conversion, batch renaming, metadata editing, watermark removal, image upscaling, and image captioning. Supports batch processing, recovery, remote access, sequential model processing, NCNN conversion (if possible), and GPU acceleration.

I. Application Features:

Core Functionality:
- Background Removal (RMBG-1.4, RMBG-2.0, BiRefNet, "No Background Removal").
- Background Options:
  - Transparent: Leave the background transparent (standard PNG behavior).
  - Solid Color: Fill the background with a user-selected color.
  - Mask Image: Create a separate grayscale mask image (white for foreground, black for background).
- Image/Folder Input.
- Output: Save processed images (PNG, JPG, HEIC, WebP, GIF, etc.).
- Batch Renaming.
- Watermark Removal (Florence-2 + LaMa).
- Image Upscaling.
- Image Captioning.
- Image Format Conversion.
User Interface:
- Model Selection (Background Removal, Upscaling, Captioning).
- Multiple Model Selection (Background Removal).
- Image Preview.
- Output Preview (Optional).
- Progress Indicator.
- "Process" Button.
- "Kill Task" Button.
- "Open Folder" Button.
- Settings Panel:
  - Output Folder Selection.
  - Aspect Ratio Presets.
  - Custom Aspect Ratio Input.
  - Maximum Resolution.
  - Background Options:
    - Radio buttons: "Transparent", "Solid Color", "Mask Image".
    - If "Solid Color": A color picker widget.
  - Batch Rename Settings.
  - Watermark Removal Settings.
  - Upscaling Model Selection and Factor/Resolution.
  - Captioning Settings.
  - Image format selection option.
  - GPU Acceleration Checkbox.
- Metadata Editor Tab/Window.
- Image Captioning Display Area.
- About/Version Page.
Batch Processing: (Sequential, multi-model).
Recovery Mechanism: (Checkpointing and resume).
Context Menu Integration: (Quick Actions).
Remote Access (API):
- Built-in Web Server (Flask).
- API Endpoints: /upload, /process, /status, /download, /metadata, /captions, / (optional HTML).
- IP Address and Port Display.
Image Manipulation Without Background Removal.
Metadata Editing (Civitai/AUTOMATIC1111 Focus).
Model Format Conversion (NCNN): (If feasible).
GPU Acceleration:
New Feature Added
- change background color: Allow user change background to a given color
- leave as transparent background
- create mask image: Allow user to create mask image, black for background, white object.

II. Technology Stack: (No changes)

III. Development Stages:

Project Setup: (Same)
GUI Development:
- Add UI elements for the background options:
  - Radio buttons for "Transparent", "Solid Color", "Mask Image".
  - A color picker widget (e.g., QColorDialog) that appears when "Solid Color" is selected.
Core Logic Implementation:
- Modify the image processing functions to handle the new background options:
  - Transparent: This is the default behavior after background removal (no extra steps needed).
  - Solid Color: Use Pillow's Image.paste() method to paste the foreground image onto a new image filled with the selected color.
  - Mask Image: Save the generated mask (from the background removal model) as a separate grayscale image.
Model Integration: (Slight modifications)
- The background removal scripts now need to return the mask (in addition to or instead of the image with the transparent background).
- The main application will then use this mask to create the desired output (transparent, colored, or mask image).
Threading Implementation: (Same)
Batch Processing Implementation: (Adapt to handle new background options)
Recovery Mechanism Implementation: (Same)
Multiple Model Instance Implementation: (Sequential)
"Kill Task" Button Implementation: (Same)
Context Menu Integration: (Same)
API Implementation:
- Modify the /process endpoint to accept a background_option parameter (e.g., "transparent", "color", "mask") and a background_color parameter (if "color" is selected).
Packaging (PyInstaller): (Same)
Testing: (Thorough testing of all background options)
Notarization:

IV. Detailed Function Introductions:

All previous functions.
Background Options:
- Transparent: The default behavior. The background is removed, leaving transparency.
- Solid Color: The user selects a color using a color picker. The background is removed, and the transparent areas are filled with the chosen color.
- Mask Image: The background is removed, and a separate grayscale image is created, representing the mask (white for foreground, black for background). This is useful for further image editing or for training AI models.

V. Code Modifications (Illustrative - Python/PyQt):

Here's a snippet illustrating how you might modify the process_image function (within your main app.py) to handle the new background options. This is a simplified example and assumes you have already obtained the mask from your background removal script.

      from PIL import Image, ImageColor
#... other imports

def process_image(self, image_path, output_path, model, settings):
    # ... (get selected model, settings, etc.) ...

    # --- Run Background Removal Script (get the MASK) ---
    # (This part is the same as before, but the script now returns the MASK)
    process = subprocess.run(
        [python_exe, script_path, image_path, temp_output_path, model_dir], #temp output path
        capture_output=True,
        text=True,
        check=False,
    )

    if process.returncode != 0:
        # Handle error...
        return

	# --- Load the original image and the mask ---
    try:
        orig_image = Image.open(image_path).convert("RGBA")
        mask_image = Image.open(temp_output_path).convert("L")  # Load as grayscale

        # --- Apply Background Option ---
        if settings["background_option"] == "transparent":
            orig_image.putalpha(mask_image)  # Apply the mask directly
            final_image = orig_image

        elif settings["background_option"] == "color":
            color = settings["background_color"]  # e.g., "#FF0000" (red)
            color_tuple = ImageColor.getrgb(color) + (255,) # Convert hex to RGBA
            new_background = Image.new("RGBA", orig_image.size, color_tuple)
            final_image = Image.composite(orig_image, new_background, mask_image)


        elif settings["background_option"] == "mask":
            final_image = mask_image  # Just use the mask image
            output_path = output_path.replace(".png", "_mask.png") #change output name

        # --- Save the final image ---
        final_image.save(output_path)

    except Exception as e:
        print(f"Error processing image: {e}")
        # Handle the error...

    # ... (rest of the processing - upscaling, captioning, etc.) ...

Key changes in the code snippet:

Load Mask: The code now loads the mask_image (assuming your background removal script saves it to temp_output_path).
background_option: The code checks the settings["background_option"] value (which comes from the radio buttons in your UI).
Transparent: If "transparent", it applies the mask directly using putalpha().
Solid Color:
- Gets the selected color from settings["background_color"].
- Uses ImageColor.getrgb() to convert the hex color code (e.g., "#FF0000") to an RGB tuple.
- Adds an alpha value of 255 to create an RGBA tuple: color_tuple = ImageColor.getrgb(color) + (255,)
- Creates a new image (new_background) filled with the selected color.
- Uses Image.composite() to combine the original image and the new background, using the mask to control which parts are taken from each image. This is the correct way to fill the background.
Mask Image: If "mask", it simply uses the mask_image as the final image and rename the output file name.
Output Path Change: change the output path when create mask image.