Current Released Lune
https://civitai.com/models/2111161/lune-flow-matching-sd15-flux
Lune Still Remembers Laion
Training the student from it's original form using a teacher's images damaged the internals and reorganized, but did not destroy them.
Target Lune
This lune was selected due to being post portrait trained at lower timesteps (0-500) and I desired control of those portraits in a more meaningful manner.
Dataset and Purpose
https://huggingface.co/datasets/AbstractPhil/CN_pose3D_V10_512
This dataset was a resized and mask-synthesized variant of;
https://huggingface.co/datasets/tori29umai/CN_pose3D_V10
Masking
The model utilized masking, removed "simple background" and "white background", then appended "transparent background" appended to the end of each prompt.
The binary mask was used in this fashion;
def get_prediction(batch, log_to=None):
latents, masks, encoder_hidden_states, ids, prompts = batch
latents = latents.to(dtype=torch.float32, device=device)
masks = masks.to(dtype=torch.float32, device=device)
encoder_hidden_states = encoder_hidden_states.to(dtype=torch.float32, device=device)
batch_size = latents.shape[0]
# Apply dropout for CFG support
dropout_mask = torch.rand(batch_size, device=device) < config.dropout
encoder_hidden_states = encoder_hidden_states.clone()
encoder_hidden_states[dropout_mask] = 0
# Sample timesteps with shift - constrained to [min_timestep, max_timestep]
min_sigma = config.min_timestep / 1000.0
max_sigma = config.max_timestep / 1000.0
sigmas = torch.rand(batch_size, device=device)
sigmas = min_sigma + sigmas * (max_sigma - min_sigma)
# Apply shift transformation
sigmas = (config.shift * sigmas) / (1 + (config.shift - 1) * sigmas)
timesteps = sigmas * 1000
sigmas = sigmas[:, None, None, None]
# Flow matching
noise = torch.randn_like(latents)
noisy_latents = noise * sigmas + latents * (1 - sigmas)
target = noise - latents
# Predict velocity (standard 4-channel input)
pred = unet(noisy_latents, timesteps, encoder_hidden_states, return_dict=False)[0]
# Calculate loss with mask applied
loss = F.mse_loss(pred, target, reduction="none")
loss = loss.mean(dim=1) # Average over channels: [B, H, W]
# Apply mask: only compute loss on non-masked regions
# masks: [B, H, W] with 1=keep, 0=ignore
masked_loss = loss * masks
# Average over spatial dimensions, weighted by mask
loss_per_sample = masked_loss.sum(dim=[1, 2]) / (masks.sum(dim=[1, 2]) + 1e-8)
if log_to is not None:
for i in range(batch_size):
log_to["train_step"].append(global_step)
log_to["train_loss"].append(loss_per_sample[i].item())
log_to["train_timestep"].append(timesteps[i].item())
log_to["trained_images"].append({
"step": global_step,
"id": ids[i],
"prompt": prompts[i]
})
return loss_per_sample.mean()The masks were specifically utilized in this fashion.
There are multiple alternative methodologies I plan to explore; including feathering, gaussian alpha masking, and more.
More than likely gaussian masking will be in the most useful, and I plan to run that variation next.
No Mask Training In-Progress
The outcomes were quite interesting.
Lets label them;
Try1 and Try2;
Try1=Step600-900
try 1 I consider weaker, as it seems to have damaged much context awareness
Try2=Step700-900
try 2 is my current winning expectant and may be adopted if the alpha trains go badly
Universal Negative Prompt
Below is the universal negative prompt I will be synthesizing images with. This is not the positive prompt, however this will likely result in strange and deformed effects for this version. Be warned, this model is research-heavy and not perfected yet.
nsfw, nudity, nude, upscaled, ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), out of frame, extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))),Baseline Test
Settings

Baseline: a woman
SD15

SD15 LUNE PRETRAINED Release Version

Multiplied Latent

SD15 Lune-Flux Release Version

SD15 Lune-Flux ControlNet - timesteps 600-900

SD15 Lune-Flux ControlNet - timesteps 700-900

Test 1: Portrait
Every single FIRST displayed image will be TRY1 and second displayed will be TRY2 if unlabeled.
3d, 1girl, from side, portrait

3d, 1girl, from side, blue hair, black sirt, portrait

3d, 1girl, from side, blue hair, black shirt, green eyes, portrait

photograph of a 1girl, from side, blue hair, black shirt, green eyes, portrait

a portrait photograph of a woman viewed from side. she has long wavy blue hair and is wearing a black shirt. Her eyes are deep green

a portrait photograph of a woman viewed from side. she has long wavy blue hair and is wearing a black shirt. Her eyes are deep green.
real, photograph, RAW photograph, photorealistic, 

a portrait photograph of a beautiful gothic woman viewed from side. she has long wavy dark cobalt blue long hair and is wearing a black shirt. Her eyes are deep green and her lips black.
real, photograph, RAW photograph, photorealistic, 

a portrait photograph of a beautiful gothic woman viewed from side. she has long wavy dark cobalt blue long hair and is wearing a black shirt. Her eyes are deep green and her lips black.
real, photograph, RAW photograph, photorealistic, sharp and perfect photo, fashion, 

anime cartoon portrait photograph of a beautiful gothic woman viewed from side. she has long wavy dark cobalt blue long hair and is wearing a black shirt. Her eyes are deep green and her lips black.
real, photograph, RAW photograph, photorealistic, sharp and perfect photo, fashion, 

So far both are highly responsive. Lets try full form poses now.
Test 2: Cowboy Shot
Neither models learned this prior to controlnet training, so lets see if the controlnet actually taught it.
3d, 1girl, cowboy shot, blue hair, red eyes, 

3d, 1girl, cowboy shot, blue hair, purple eyes, 

Neither model learned cowboy shot. Lets see if it knows the actual tag then.


As expected, the less timesteps instilled some behavior without destroying the style.
1girl, cowboy shot, face, eyes, upper body, shoulders, from side, 

1girl, cowboy shot, face, eyes, upper body, shoulders, from behind

1girl, cowboy shot, face, eyes, upper body, shoulders, from behind,
red dress, blue collar, purse, 

1girl, cowboy shot, face, eyes, upper body, shoulders, from behind,
red dress, blue collar, purse, purple elbow gloves, 

Test 3: Sitting
1girl, from side, sitting

1girl, from side, sitting on a chair

1girl, from side, sitting on a chair, blue eyes, brown hair, 

1girl, from side, sitting on a chair, blue eyes, brown hair,
kitchen, kitchen table, depth of field, complex background, 

1girl, from side, sitting on a chair, blue eyes, brown hair,
bedroom, computer chair, posters, computer desk, computer monitor, depth of field, complex background, 

Sitting appears to not work at all with the portrait and controlnet system yet.
a beautiful woman sitting on a chair in a restaurant

a beautiful woman sitting on a chair in a restaurant, blue dress

The plain english took just fine.
Experiments ongoing
The next update will be based on how well the masking took vs non-masking with different timestep tests.

