Sign In

Experiment: Teaching SD15 - Flow Lune a ControlNet dataset at timesteps 700-900 and 600-900

0

Current Released Lune

https://civitai.com/models/2111161/lune-flow-matching-sd15-flux

Lune Still Remembers Laion

Training the student from it's original form using a teacher's images damaged the internals and reorganized, but did not destroy them.

Target Lune

https://huggingface.co/AbstractPhil/sd15-flow-lune-flux/blob/main/sd15_flow_ffhq_low_t_portraits_s5000.pt

This lune was selected due to being post portrait trained at lower timesteps (0-500) and I desired control of those portraits in a more meaningful manner.

Dataset and Purpose

https://huggingface.co/datasets/AbstractPhil/CN_pose3D_V10_512

This dataset was a resized and mask-synthesized variant of;

https://huggingface.co/datasets/tori29umai/CN_pose3D_V10

Masking

The model utilized masking, removed "simple background" and "white background", then appended "transparent background" appended to the end of each prompt.

The binary mask was used in this fashion;

    def get_prediction(batch, log_to=None):
        latents, masks, encoder_hidden_states, ids, prompts = batch
        
        latents = latents.to(dtype=torch.float32, device=device)
        masks = masks.to(dtype=torch.float32, device=device)
        encoder_hidden_states = encoder_hidden_states.to(dtype=torch.float32, device=device)
        
        batch_size = latents.shape[0]
        
        # Apply dropout for CFG support
        dropout_mask = torch.rand(batch_size, device=device) < config.dropout
        encoder_hidden_states = encoder_hidden_states.clone()
        encoder_hidden_states[dropout_mask] = 0
        
        # Sample timesteps with shift - constrained to [min_timestep, max_timestep]
        min_sigma = config.min_timestep / 1000.0
        max_sigma = config.max_timestep / 1000.0
        
        sigmas = torch.rand(batch_size, device=device)
        sigmas = min_sigma + sigmas * (max_sigma - min_sigma)
        
        # Apply shift transformation
        sigmas = (config.shift * sigmas) / (1 + (config.shift - 1) * sigmas)
        timesteps = sigmas * 1000
        sigmas = sigmas[:, None, None, None]
        
        # Flow matching
        noise = torch.randn_like(latents)
        noisy_latents = noise * sigmas + latents * (1 - sigmas)
        target = noise - latents
        
        # Predict velocity (standard 4-channel input)
        pred = unet(noisy_latents, timesteps, encoder_hidden_states, return_dict=False)[0]
        
        # Calculate loss with mask applied
        loss = F.mse_loss(pred, target, reduction="none")
        loss = loss.mean(dim=1)  # Average over channels: [B, H, W]
        
        # Apply mask: only compute loss on non-masked regions
        # masks: [B, H, W] with 1=keep, 0=ignore
        masked_loss = loss * masks
        
        # Average over spatial dimensions, weighted by mask
        loss_per_sample = masked_loss.sum(dim=[1, 2]) / (masks.sum(dim=[1, 2]) + 1e-8)
        
        if log_to is not None:
            for i in range(batch_size):
                log_to["train_step"].append(global_step)
                log_to["train_loss"].append(loss_per_sample[i].item())
                log_to["train_timestep"].append(timesteps[i].item())
                log_to["trained_images"].append({
                    "step": global_step,
                    "id": ids[i],
                    "prompt": prompts[i]
                })
        
        return loss_per_sample.mean()

The masks were specifically utilized in this fashion.

There are multiple alternative methodologies I plan to explore; including feathering, gaussian alpha masking, and more.

More than likely gaussian masking will be in the most useful, and I plan to run that variation next.

No Mask Training In-Progress

The outcomes were quite interesting.

Lets label them;

Try1 and Try2;

Try1=Step600-900

  • try 1 I consider weaker, as it seems to have damaged much context awareness

Try2=Step700-900

  • try 2 is my current winning expectant and may be adopted if the alpha trains go badly

Universal Negative Prompt

Below is the universal negative prompt I will be synthesizing images with. This is not the positive prompt, however this will likely result in strange and deformed effects for this version. Be warned, this model is research-heavy and not perfected yet.

nsfw, nudity, nude, upscaled, ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), out of frame, extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))),

Baseline Test

Settings

image.png

Baseline: a woman

SD15

image.png

SD15 LUNE PRETRAINED Release Version

image.png

Multiplied Latent

image.png

SD15 Lune-Flux Release Version

image.png

SD15 Lune-Flux ControlNet - timesteps 600-900

image.png

SD15 Lune-Flux ControlNet - timesteps 700-900

image.png

Test 1: Portrait

Every single FIRST displayed image will be TRY1 and second displayed will be TRY2 if unlabeled.

3d, 1girl, from side, portrait
image.pngimage.png

3d, 1girl, from side, blue hair, black sirt, portrait
image.pngimage.png

3d, 1girl, from side, blue hair, black shirt, green eyes, portrait
image.pngimage.png

photograph of a 1girl, from side, blue hair, black shirt, green eyes, portrait
image.pngimage.png

a portrait photograph of a woman viewed from side. she has long wavy blue hair and is wearing a black shirt. Her eyes are deep green
image.pngimage.png

a portrait photograph of a woman viewed from side. she has long wavy blue hair and is wearing a black shirt. Her eyes are deep green.

real, photograph, RAW photograph, photorealistic, 
image.pngimage.png

a portrait photograph of a beautiful gothic woman viewed from side. she has long wavy dark cobalt blue long hair and is wearing a black shirt. Her eyes are deep green and her lips black. 

real, photograph, RAW photograph, photorealistic, 
image.pngimage.png

a portrait photograph of a beautiful gothic woman viewed from side. she has long wavy dark cobalt blue long hair and is wearing a black shirt. Her eyes are deep green and her lips black. 

real, photograph, RAW photograph, photorealistic, sharp and perfect photo, fashion, 
image.pngimage.png

anime cartoon portrait photograph of a beautiful gothic woman viewed from side. she has long wavy dark cobalt blue long hair and is wearing a black shirt. Her eyes are deep green and her lips black. 

real, photograph, RAW photograph, photorealistic, sharp and perfect photo, fashion, 
image.pngimage.png

So far both are highly responsive. Lets try full form poses now.

Test 2: Cowboy Shot

Neither models learned this prior to controlnet training, so lets see if the controlnet actually taught it.

3d, 1girl, cowboy shot, blue hair, red eyes, 
image.pngimage.png
3d, 1girl, cowboy shot, blue hair, purple eyes, 
image.pngimage.png

Neither model learned cowboy shot. Lets see if it knows the actual tag then.

image.pngimage.png

As expected, the less timesteps instilled some behavior without destroying the style.

1girl, cowboy shot, face, eyes, upper body, shoulders, from side, 
image.pngimage.png

1girl, cowboy shot, face, eyes, upper body, shoulders, from behind
image.pngimage.png

1girl, cowboy shot, face, eyes, upper body, shoulders, from behind,

red dress, blue collar, purse, 
image.pngimage.png

1girl, cowboy shot, face, eyes, upper body, shoulders, from behind,

red dress, blue collar, purse, purple elbow gloves, 
image.pngimage.png

Test 3: Sitting

1girl, from side, sitting
image.pngimage.png

1girl, from side, sitting on a chair
image.pngimage.png

1girl, from side, sitting on a chair, blue eyes, brown hair, 
image.pngimage.png

1girl, from side, sitting on a chair, blue eyes, brown hair, 

kitchen, kitchen table, depth of field, complex background, 
image.pngimage.png

1girl, from side, sitting on a chair, blue eyes, brown hair, 

bedroom, computer chair, posters, computer desk, computer monitor, depth of field, complex background, 
image.pngimage.png

Sitting appears to not work at all with the portrait and controlnet system yet.

a beautiful woman sitting on a chair in a restaurant
image.pngimage.png

a beautiful woman sitting on a chair in a restaurant, blue dress
image.pngimage.png

The plain english took just fine.

Experiments ongoing

The next update will be based on how well the masking took vs non-masking with different timestep tests.

0