Showcase: Extreme SD 1.5 Optimization on Intel i3-1005G1 (Ice Lake)
Greetings, @intel_OpenVINO and community. I wanted to share the results of a deep optimization project: running Stable Diffusion 1.5 with ControlNet on a mobile dual-core processor, achieving surprising stability and performance.
1. The OS Stack & Hardware Tuning
CPU: Intel Core i3-1005G1 (2C/4T)
OS: CachyOS (XFS on NVMe)
Kernel: linux-bore (Optimized for system responsiveness under 100% CPU load)
Power: Unlocked TDP to 21W (PL1/PL2) | Turbo 3.2 GHz.
RAM: 12GB + 8GB NVMe Swap.
2. Generation Pipeline (OpenVINO)
The key was using optimum-intel with INT8 quantization to bypass the memory bandwidth bottleneck of the i3.
Threads: INFERENCE_NUM_THREADS: 4 (1:1 mapping to logical threads).
Scheduler: Euler Ancestral (15 steps).
Resolution: 768x512 with ControlNet Canny.
Real Performance:
Iter 1: 16.91s/it
Iter 2: 17.53s/it
Iter 3: 16.64s/it
Average: ~17s/it (Approx. 4:15 min per image).
3. Optimized Inference Script
(Paste this inside the "Insert Code" block of the forum)
Python
import torch
from optimum.intel.openvino import OVStableDiffusionPipeline
from diffusers import ControlNetModel, EulerAncestralDiscreteScheduler
import os, datetime, cv2, numpy as np
from PIL import Image
# Setup
base_path = "/home/desk/Escritorio/ROMSF5"
ov_config = {"INFERENCE_NUM_THREADS": "4", "PERFORMANCE_HINT": "LATENCY"}
# Load optimized IR Model
pipe = OVStableDiffusionPipeline.from_pretrained(
model_path, compile=True, ov_config=ov_config, device="CPU"
)
pipe.run_safety_checker = lambda image, device, dtype: (image, [False] * len(image))
pipe.reshape(batch_size=1, height=768, width=512, num_images_per_prompt=1)
# Generation
prompt = "[lora:detailedteeth_il_v1:0.5], (tongue peek:1.4), (parted lips:1.2), 1girl, gradient hair"
negative_prompt = "long tongue, open mouth, sharp teeth"
output = pipe(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=15)
output.images[0].save("output.png")4. Post-Processing: Vulkan Upscaling
To finalize the "Cinema" look without overloading the CPU, I use the Intel UHD iGPU via Vulkan for a 2x upscale.
Python
import os, subprocess
# Scaling via iGPU (Vulkan)
imagenes = [f for f in os.listdir("./") if f.endswith(".png") and not f.startswith("ULTRA_")]
for img in imagenes:
output_name = f"ULTRA_{img}"
cmd = ["realesrgan-ncnn-vulkan", "-i", img, "-o", output_name, "-n", "realesr-animevideov3", "-s", "2"]
subprocess.run(cmd, check=True)
if os.path.exists(output_name):
os.remove(img)Final Thoughts
This setup proves that Intel OpenVINO and proper system tuning (CachyOS/Bore) can transform a "low-end" 10th Gen i3 into a capable AI workstation for hobbyists. It’s not about the core count; it’s about the optimization.
Special thanks to the @intel_Support and the OpenVINO devs for the amazing toolkit
