This article covers how to install & use the Auto1111/ForgeUI SegmentAnything extension, which uses a combination of GroundingDino for detection boxes of concepts in any image then SegmentAnything to generate masks for LoRa Training, img2img, Controlnet...
It can be found here : https://github.com/continue-revolution/sd-webui-segment-anything
This extension is very powerful and is IMHO worth installing Auto1111/ForgeUI just for it.
NB : If you know alternative solutions, for ComfyUI or whatever, please leave a comment š
Installation :
The simplest way to install it is using the āExtensions Tab > Install from URLā in uto1111/ForgeUI
Follow the readme on github. DO READ IT at some point. It contains a lot of valuable information, even very interesting stuff about Controlnet I have not mastered yet.
There are also some videos showing how it works. Itās pretty straightforward tbh.
It will not work out of the box, you need to download a SAM model from the readme page and paste it in your extension folder (for me āC:\Forge\webui\extensions\sd-webui-segment-anything-altoidsā). Iām using the SAM-HQ largest one
Also, go to Settings tab > Segment Anything and tick āUse local groundingdino to bypass C++ problemā to get rid of some annoying messages in the terminal further on.
How to use it to generate masks
Quick explanation of how it works to generate a single mask before we move on to batch generation to do the whole dataset š
You go to either Txt2img or Img2Img tab then you unfold SegmentAnything.
Then you have to select the SAM model you want to use, and upload a picture.
There you have two options :
Click on the picture to specify through positive and negative points what you want to keep or exclude in the picture. Iām not using that - it wonāt work for batchs
Enable GroundingDino so you can prompt what you want to keep
So you tick Enable GroundingDino and select a GroundingDino model. Despite being smaller, I prefer the SwinT_OGC one because in my tests I could use the āOnly Female faceā in the Detection Prompt. Thatās right. Itās able to detect only the girlās face !
You can use detection prompts such as : āFaceā, āFace and hairā, āPersonā, āBodyā, āMouthā, āNoseā, āMouth and noseā, āEyesā. (Tell me if you find good ones please)
It is truly awesome but has some limitations. I havenāt found how to use negative prompts (āFace without mouthā for instance) with it, doesnāt seem possible with the extension.
The Box Threshold defines how tolerant the detection is. Default 0.3 value is fine.
Click on āI want to previewā so you can test your prompt on your picture. Depending on your prompt you will see one or several boxes appear.
Here I prompted just āFaceā so I get two boxes, number 0 and number 1. Then you can select which boxes you want to generate masks from using the tickboxes.
Another example using āOnly female faceā. Please note it doesnāt always work, thatās why I had to change the pic :-D
I think GroundingDino is the best at this game at the moment though.
So now I run SegmentAnything using āPreview Segmentationā on that box 0, which will output a grid of three possibles sets of āBlendā / āMaskā / āMasked imageā. Hereās one :
The blend is the picture with the mask and the GroundingDINO Ā box so you can check the detection process.
The mask is, well, the mask.
The masked image is what was cropped, and could be useful for other purposes.
There are others interesting options under that I will let you discover. But basically you can pick up your favorite mask out of the 3 sets and then use shortcuts to Inpainting / Controlnet. I donāt really use that stuff yet but it looks very valuable. See the readme.
Expand mask is useful to reupload a mask and as it name says, expand/dilate it
How to use it to batch generate masks for a whole dataset
I cover this subject in my other article "Training non-Face altering LoRas : Full workflow" https://civitai.com/articles/8974