Sign In

How to colorize Manga images with ControlNet in Auto1111

17

How to colorize Manga images with ControlNet in Auto1111

Hello everyone,

Important Changes (27.12.2025!!!), i used the wrong preprocessor!!! You should use "lineart_anime" instead of "invert (from white bg & black line)"!!! Furthermore it helps to change Ending Control Step to 0.8-1.0 (0.8 works great for me).

New hint (01/01/2026): Want to match a specific anime style? Try using style LoRAs — they helped me a lot with a One Piece character.

today, I’d like to describe my workflow for creating “Juujika no Rokunin” character models.
As you probably know, the manga has no anime adaptation yet, so to get fully colored models you have to do a few extra steps.
My solution: ControlNet Colorization.

Before we start: the workflow can be a bit tedious — there are several things you need to prepare in order to create a good image.


Preparation

Before you can even begin, some conditions must be met:

  • You need to know what the character actually looks like.
    Sounds obvious, but some side characters barely have any visual reference or appear only a few times.

  • You need a fair number of images to train a model.
    Usually between 30–200 images.
    My sweet spot is ~100.

  • Small trick: Inpainting
    If you don’t have enough images, you can inpaint existing ones to add variety.


Requirements

  • Installed Automatic1111, or a fork/extension (ReForge, Forge, Forge Neo…)

  • If you know how to use ComfyUI, it will probably work as well, but since I don’t use it, I can’t guarantee anything.

I’m currently using Forge f2.0.1v1.10.1-previous-669-gdfdcbab6 classic with Torch 2.8, since I have an RTX 5070 Ti GPU.
(If you have a suggestion for a newer, fully working Forge build — especially with xFormers for Blackwell — please drop it in the comments! actually found a newer forge here. It supports some new features like sageattention, flashattention, xformers for blackwell, cuda streams... )

  • Some basic knowledge in Automatic1111

    • how to install extensions

    • basic prompting (danbooru tagging, quality tag etc...)

    • where to put your models (checkpoint, lora, adetailer, controlnet, ersgan)

  • An Image editing Tool (GIMP/Photoshop etc..)

    • if you know what you do, you could also use ocr based tools to detect and delete text, I tried, but don't like it

    • Sniping Tool

I'm currently using "Fotor" to edit the images, since it have a smart delete/ai replace function, but since it's not free, I am making this tutorial with gimp. You also need MS Designer to crop the colored images (since the completed image tends to have ugly borders/edges and you don't want them in your trained model)

  • An Upscaler

I'm using Upscayl, but you could also use the img2img function of auto1111 with a decent upscaler

Some words about extensions: Like i discriped in my other article it not the best idea to use that many extension at once. However, it doesnt mean they are bad, they just tend to produce too good result and this is contraproductive for model training. In case of colorazion in ControlNet i still recommend to use them (reduce them though - i would probably use cd Tuner, SDXL Styles and FreeU)

*Already integrated in Forge f2.0.1v1.10.1-previous-669-gdfdcbab6

So, let’s get started!


Let's get started

  • Gather your images (Manga reader, Google search, etc.)

Tip for manga readers:
Zoom in — even if the image looks blurry, we’ll upscale it anyway.
(As long as you can clearly recognize the character!)

  • Upscale your images

    • upscale factor depends on the image, I use 2x most the time

  • clean you image up, that means:

    • cleanup text

    • cleanup speech-bubbles

    • cleanup background noise (e.g. another person in background)

    • complete body parts yourself or crop the image that it don't look strange

      • You don't need to be a pro, just make some outlines - controlNet and your prompt will do the rest!

grafik.png

Example pic of completing body part yourself

  • open auto1111

  • install all (mandatory at least) of the above extensions and get your models (at least mistoline for controlnet) to the right place

  • enable controlnet checkbox and drag & drop your edited and upscaled Manga image

    • ControlNet Settings:

      • check "ControlNet Unit 0"

      • check "Enable"

      • check "Pixel Perfect"

      • check "lineart"

      • choose "invert (from white bg & black line)" "lineart_anime" as Preprocessor

      • change model to "mistoLine_fp16" OR "mistoline_rank256" (you probably have to click the "refresh" button beforehand)

      • change control weight to 1.5

        • <1.5 some characteristics will be lost

        • >1.5 most likely overfitting

      • NEW (27.12.2025): Change Ending control step to 0.8-1.0 (0.8 works great for me)

      • Leave everything else at default and click the arrow button on the right side under the preview panel. This will send the width and height lengths to your sampler settings!

In the end, it should look something like this:

grafik.png


Prompting Tips and Sampler Settings

Now you need to write a prompt that tells ControlNet how it should colorize your image.
You should describe what’s in the image:

Positive prompt

  • Background (indoors, outdoors)

    • Tip: Make it simple! Keep in mind, you want to use the image for training, so you don't want to have noisy images. So try to simplify things

  • Appearance of the character, for momoki image from above:
    prompt: 1girl, solo, portrait, from side, close-up, pale skin, short hair, purple hair, bangs, blue eyes, makeup, purple eyeshadow, purple lips

    • don't forget to describe the outfit, if any!

  • character action (what did she do?):

    • evil smile, evil eyes, parted lips (in this case she's just smiling)

    • other momoki typically tags are like "holding shotgun", "holding syringe", "standing"

  • Quality and style specific tags

    • anime coloring, anime screencap, masterpiece, best quality, amazing quality, no lineart, oekaki

Negative prompt

  • Quality and style specific tags, YOU DO NOT WANT:

    • worst quality, worst detail, sketch, comic, greyscale, monochrome, blurry, lineart

  • Body specific parts YOU DO NOT WANT:

    • bad anatomy, bad proportions, extra limbs, extra digit, extra legs, extra legs and arms, disfigured, missing arms, too many fingers, fused fingers, missing fingers, unclear eyes

General recommendation: Use the Danbooru database, you can find the tags you're looking for there: Tag Groups

Model

Sampler Settings

  • Sampler method: Euler a

  • schedule type: Automatic

  • Sampling steps: 30

  • hires, fix enabled

  • width: get it from controlnet!*

  • height: get it from controlnet!*

  • CFG Scale: 7

  • Upscaler: R-ESRGAN 4x+ Anime6B

  • hires steps: 20

  • denoising strength: 0.3-0.5 (i'm using 0.5)

  • upscaled by: 1.5*

  • highres cfg scale: 7

*ControlNet will mess up the image if it's too big: your image should stay within the known sdxl lengths (which is 1024 x 1024, 1216 x 832 or 832 x 1216). If your image is bigger than that, you have to shrink it manually. If you do, you should decrease both sides by the same amount, to prevent a distorted image to be generated!

In the end, it should look something like this:

grafik.png

Extensions (optional)

The extension can increase the output. At the same time, you have to use it with care, since it also can distort the appearance of your character, the style and/or color of your image. We will use all above stated extensions aside from Adetailer in txt2img. We use Adetailer only in Img2Img at the end!

  • CFGRescale: simply enable, leave default (0.7)

  • SDXL Styles: enable and choose "Anime"

  • CD Tuner: enable, change Detail(d1) to 2 and leave the rest untouched

  • Detail Daemon: simply enable, leave defaults

  • CFG-Fix: simply enable, leave defaults

  • FreeU: enable, B1=1.1, B2=1.2, S1=0.6, S2=0.4

  • SelfAttentionGuidance: enable, leave defaults

The Last touch

After you did get a good/usable output send it to img2img to upscale and improve it:

  • leave Sampler like before, just change:

    • Resize by: 1.5

    • Denoising strength: 0.15

  • You also can use the exact same extension Settings like in txt2img, since the denoising strength is rather low you don't have to be afraid of overfitting or changing to much

  • Now you need to enable and configure Adetailer:

    • 1th model: 99coins_anime_girl_face_m_seg.pt

      • detection confidence: 0.7

      • detection method: confidence

    • 3rd model: Nipple-yoro11x_bbox.pt (for NSFW content!)

      • positive prompt: <lora:IL20_NP31i:0.6> nipples,

        • you need this Lora for it

        • if you want other nipples, you have to change prompt:

          • <lora:IL20_NP31i:0.6> puffy nipples

          • <lora:IL20_NP31i:0.6> long nipples

          • etc...

      • Detection -> detection confidence: 0.7

      • Detection -> detection method: confidence

      • Mask Preprocessing -> Mask erosion (-) / dilation (+): 24

      • Impainting -> impaint mask bur: 12

      • Impainting -> impaint only masked: 96

      • Impainting -> impaint denoising strength: 0.4

      • Impainting -> use seperate width/heigh: 1024/1024

    • 4rd model: pussy_v4_best.pt (in my case) or pussy_yolo11s_seg_best.pt (latest)
      (NSFW only!)

      • Detection - same as Nipple

      • Mask Preprocessing -> Mask erosion (-) / dilation (+): 4 (default)

      • Impainting -> same as Nipple

  • For Upscaler follow these steps:

    • scroll down until you see "Scripts"

    • Assuming you did install the extension, there should be a script named "Ultimate SD upscale", click on it now you should be able to configure it:

      • Target size type: From img2img2 settings

      • Upscaler: 4x-Ultrasharp (assuming you put your upscaler model.pth in your ERSGAN folder)

      • Type: Chess

      • Tile width: 1024

      • Tile height: 0

      • Mask Blur: 32*

      • Padding: 52*

*If you leave defaults, you will probably notice some subtle lines from the tiles, increasing the Settings help to make it less noticeable

It should look like this in the end:

grafik.png

The really last touch

You might think you're done, but your output might not be fit to be a good image for training. There are still a few issues to address, like:

  • weird edges/borders

  • subtle tile lines

  • noise

for the "weird edges" problem, you have to crop your image with your MS Designer (or some other tool). Do not take this on light shoulder, you model WILL generate them to if the influence is high!

For the subtle line you can use the healing tool from gimp, Same goes for the noise. If the noise is bigger, you probably should use the cloning tool first and the healing tool for cleaning up.

Now you can train your model. Most of the time, the first version will still have room for improvement, since your training images may contain flaws and errors (mostly in quality, color, and saturation; some comic-style elements might also be present).
Therefore, I recommend creating a Version 2.0 using high-resolution images generated from Version 1.0.

FAQ

Q: The shadow (on collarbone, cleavage, neck) etc. is too noticeable or messed up!

A: You have too make the shadow smoother. I'm using the healing Tool in Gimp for it. Here is an example image:

cefac87604ef17352a965b7d21588d00_upscayl_2x_upscayl-standard-4x.PNG

Q: My Output Image in txt2img is messed up!

A: check the image width and heigh. it should stay in the known SDXL dimensions.

Q: You said you could use impain to vary a existing image. What do you mean by what???

A: You can simplify an existing Manga image for a greater amount of choice. to be honest, i use this technique often for aduld content. As an example, if we look into following image:

grafik.png

The right one is much better, but dont have room for variety. For that the left one is the better choice: You can impaint it in img2img with high denoise strength first. Most of the time impainting is not even needed, because the body part is empty and the AI can generate what you want out of prompt and context.

Q: in Img2img my image turns weird!

A: Did your change the denoising strenght? I believe the text2img will also send the denoise strength from hires fix - 0.5 and that is way too much! Keep in mind for minor quality changes keep the denoise strength low, if you want to extremly change an image (e.g. turning a man to woman) you need higher denoise strength. Also to not increase the denoise strength of your adetailer model, as it also turns weird if too high. If the ai generates something like cross necklace or jewelery in general you probably should turn off all extensions except Adetailer and Upscaler.

Q: My character change to much in text2img!

A: Check your Control net weight - it should be 1.5 not 1.0 (default). If its still not enough you increase the weight, but most the time the output behave strange with weight near 2.0. Also turn all Extensions off (but leave hires-fix enabled). If it's still not right disable the hires-fix now. If the problem persists check your prompt

Q: I have hard time to generate two or more persons!

A: You shouldnt use such images in general (at least for character loras) as it could lead to a bad model. You should prevent any noise in the image. Furthermore auto1111 (and as far i know every other fork/extension) didnt support multi character generation by itself. "Supports" is a bit exaggerated, it would generate the wished number of characters, but also mix their appearance all together. You need extensions like regional promper to split the images in two seperate areas with own prompts. For static images okay, but for dynamic ones... If you still want to create an image with two or more persons, you could still do so with inpainting.

If There are questions, problems or if you see some errors and/or flaws in the workflow or have some suggestion or tips for me, fell free to leave me a comment 😀

17