Intro

BAIDU ERNIE Image Turbo (8 STEP) model is Turbo version of ERNIE Image (50 STEPS). Both models are released on hugging face 2026-04-17. And present on Civitai as models both for download (15GB) and for generate on-site. Both Apache 2.0 Licensed. Like actually open weights models, not "kind of".

Generation cost

Generation cost without tips is exact match to Z Image:

ERNIE Image - 20 BUZZ
ERNIE Image Turbo - 8 BUZZ

Built-In PE

I believe CivitAi not use it. But the models got Built-In Prompt Enhancer (PE). And that is kind of a new trend now. HiDream O1, Microsoft Lens, Ideogram 4, most of new models includes PE. It is usually not help with good, detailed prompts, use additional time and memory, so most people I think will not turn it on, but it exists. It is usable. The fun part of it translates prompt to Chinese, but same time increase chance to get Caucasian looking people instead of Asian. Let us check that. We will use HF Spaces to generate the images. seed will be 20260619 in case you for some reason will try to recreate those.

The prompt is

woman in New York subway closup professional editorial photo

same prompt done with KREA2 - https://civitai.red/images/134723130 (black)

What you see there is issue #1 of this model. Tendency to use Asian faces doesn't matter what prompt language or what place it is. It just defaulting to Asians like always.

Let us check use_pe button and unleash the prompt enhancer. Seed will be exactly the same. So now (enhanced) prompt is

专业时尚杂志风格的摄影照片。画面特写了一位身处纽约地铁中的时尚女性。她位于画面中央，面部和肩部占据主导位置。她身穿质感上乘的经典驼色双排扣风衣，内搭黑色高领针织衫，展现出纽约都市的摩登气质。女性拥有精致的五官和无暇的底妆，眼妆深邃，眼神自信地直视镜头，几缕微卷的棕色发丝自然地散落在脸颊旁。

背景是典型的纽约地铁环境，采用了极浅的景深处理以突出主体。人物身后隐约可见银色的不锈钢垂直扶手立柱和斑驳的金属车厢壁。透过虚化的背景，可以辨认出纽约地铁站标志性的白色方形复古瓷砖墙，墙上带有模糊的马赛克站牌标识，隐约可见黑底白字的 'Times Sq - 42 St' 字样。

画面构图为1:1的正方形比例，极具视觉冲击力。光线呈现出高级的电影级质感，车厢顶部偏冷的顶侧荧光灯与侧面透入的暖色环境光交织，完美勾勒出人物清晰的下颌线和立体的面部轮廓。整体色调偏向经典的冷暖对比，画面极其锐利，焦点完全锁定在女性明亮且充满故事感的双眼上，完美契合高端编辑类大片的摄影标准。

same prompt done with KREA2 - https://civitai.red/images/134723057 (asian)

Caucasian... Interesting. tiled wall in subway car, emm... that is kind of wrong... let's translate the prompt. I bet you want to know what is in it actually ERNIE enhance.

A professional fashion magazine-style photograph. The image features a close-up of a stylish woman in the New York subway. She is positioned in the center of the frame, her face and shoulders dominating the composition. She wears a high-quality classic camel double-breasted trench coat over a black turtleneck sweater, exuding a modern New York vibe. The woman has delicate features and flawless makeup, with deep-set eyes and a confident gaze directly at the camera. A few strands of slightly wavy brown hair fall naturally beside her face.

The background is a typical New York subway environment, using a very shallow depth of field to emphasize the subject. Behind her, the silver stainless steel vertical handrail pillars and the weathered metal walls of the subway car are faintly visible. Through the blurred background, the iconic white square retro tile walls of New York subway stations are discernible, with blurred mosaic station signs, and the words 'Times Sq - 42 St' in white lettering on a black background are faintly visible.

The composition is a 1:1 square, creating a strong visual impact. The lighting possesses a high-end, cinematic quality. The cool-toned fluorescent lights on the ceiling of the carriage intertwine with the warm ambient light filtering in from the sides, perfectly outlining the subject's defined jawline and sculpted facial contours. The overall color tone leans towards a classic contrast between warm and cool tones, the image is extremely sharp, and the focus is completely locked on the woman's bright and expressive eyes, perfectly matching the photographic standards of high-end editorial films.

I think you also wandering what if we use translated prompt instead of Chinese version. OK, I am also curious. (All those are generated in time I write the text, So I have no idea what output I'll get). So, result is:

same prompt done with KREA2 - https://civitai.red/images/134723058 (Caucasian)

It is not the same but close enough

I think we are ready to discuss the issues

Model issues

Tend to Asians
1. Option use_pe as workaround
2. Or add Caucasian or blond to prompt
Diagonal artifacts
1. I do not know good workarounds. do not use dark colors... maybe
Anatomy issues
1. more steps
2. different seed

So, we define first issue as tendency to Asian. That problem (if it is) got workarounds. Like use PE or second ask for Caucasian.

Now we go to second issue. Let use some dark prompt.

Single red on a curvy mountain road at night. The car head lights are beaming through fog. Photo from above. Low key.

you see it? Ok. maybe it is too subtle. Let's do another

dark grey ribbon on black background

still, no? ok I'll put a guiding lines)

So, there is the issue. It might be subtle, but there is no real workaround. A "kind of workaround" to it is use another model, like not Turbo for example. "base" ERNIE Image will do ok. Or use it for lighter images. Or pretend there is no artifacts and everything is just fine (I choose the last).

Almost forgot the issue #3. Extra limbs, fingers, bad anatomy. Those one I need some time to catch since not every image that issue it sometimes happened. Much more than with Z Image Turbo, but much less than with FLUX.2 Klein. Same seed as previous images 20260619

Extreme foreshortening of woman hand extend to viewer showing her office id card with her photo and name "Johanna Doe". She hold a card with thumb and pointing finger. On her ring finger is an elegant ring. Nail polish is teal

If you expect PE might help - it will not

trying another seed - 21

So that is kind of it. I got it now from second attempt, but it may vary. You got bad fingers - you change the seed. or add steps. But change the seed is first to do. than steps. than play with prompts.

PE version, BTW is also got better with seed 21

That is by the way is a "workflow" for ERNIE you can set some simple prompt idea. use PE and than translate PE to English and continue to work from there. Or just use ChatGPT or other LLM to get the prompt right from the very beginning.

Resources needed

For local generations I got Intel iGPUs on Intel core ultra 9 185H and 285H with 64 GB and 128 GB of RAM. Both are able to generate with ERNIE. It consumes about 45 GB of "unified" (v)RAM in bfloat16 with PE enabled. Just to compare with similar models Z Image Turbo 6B (10.2B), FLUX.2 Klein 9B (17.3B), ERNIE Image Turbo 8B (15.3B)

ERNIE Image Turbo (8 step 1024x1024) generation time is 95 seconds
Z Image Turbo (9 step 1024x1024) generation time is 85 seconds
FLUX.2 Klein 9B (4 step 1024x1024) generation time is 37 seconds

On real GPUs it will be faster, but the relative speed should remain the same.

Now when you got some overview of ERNIE Image Turbo you may want to compare it to other models. I have a lot of images generated for this purpose.

Resolution

My test shows that default resolution is the best, but you will have decent result in resolution from range from 0.5K to 2K.

Default resolutions

resolutions list is

1024x1024
896x1200, 1200x896
848x1264, 1264x848
768x1376, 1376x768

See Images generated in all standard resolutions in Low key woman portrait test.

But model actually support more resolutions and aspect ratios.

Higher resolutions

1920x1200

FullHD example with prompt:

This is a studio portrait photograph, shot in a 16:9 horizontal format. The image employs minimalist composition and low-key lighting. The background is a solid gradient from dark gray to pure black, devoid of any other environmental elements. Large areas of white space on the left and in the center occupy at least 80% of the negative space, creating a deep and tranquil atmosphere. At the far right edge of the image, a close-up of a European woman's profile is shown. She faces left, revealing only a small portion of her head (including the profile and a few strands of blonde hair). She has typical European features, with a calm and serene expression. A hand partially enters the frame; slender, elegant fingers gently touch her lips, her nails neatly trimmed and painted with a deep red glossy nail polish. The dark lighting precisely illuminates the woman's profile, the tip of her nose, her lips, and her fingers, outlining soft highlights, while the rest of her head naturally fades into the shadows on the right. The overall style is sophisticated and understated, full of dramatic contrasts of light and shadow and spatial tension.

BTW, good image to see diagonal artifacts..

2026-04-29-00059-ERNIE-Image-Turbo-1920x1200-Seed42-CFG1-AG0-STEP8.jpg

1600x400

(CivitAi profile cover image format) example with prompt

5 canvases on a dark navy to black gradient wall, one android robot with small dot-matrix display eyes is painting on the 4th canvas.
1st image - samurai, sunset, silhouette, dramatic lighting, wind, dynamic pose, dynamic angle
2nd image - underwater, sea monster, deep sea, bioluminescence, dark, dynamic view
3rd image - A tranquil sunset over a mirror-like lake reflects a lone tree and flitting birds above.
4th unfinished image - Minimalist brush and ink silhouette of a woman, inspired by traditional sumi-e aesthetics but executed with a modern twist. Flowing, expressive black ink strokes define the sweeping contour of her back and the elegant updo of her hair. The lines taper beautifully from thick to thin, showcasing the natural flexibility of the calligraphy brush. A single, ethereal minimal watercolor bloom in soft blush pink emerges organically from the dark ink of her hair, seamlessly integrating the female form with a delicate floral motif. Crisp, bright white background
5th - blank
Robot is surprised look back at the camera. Paint brush is in the robot hand touching the lines of 4th image. Left bottom corner text "SD.Next + ERNIE Image Turbo", upper 
right corner text "doing Ai at home"

That image is not only got non-standard resolution, but also it is 4 embedded prompts in it.

2026-06-15-02032-ERNIE-Image-Turbo-1600x400-Seed20260615-CFG1-AG-1-STEP8.jpg

Let's us check more resolutions (with PE enabled, forgot to disable it, but it should not impact the resolution test).

Lower resolutions

128x128

256x256

512x512

so we can see that from 512px image starts to be ok

1024x1024

2048x2048

prompt for images above is

A powerful illustrative representation of a lonely lighthouse standing against a cosmic storm, stylized in Stained-Glass Woodcut Expressionism. The lighthouse is a monolith of dark, blocky stone textures bounded by heavy, hand-inked black lines. From its beacon, a brilliant, fractured beam of pure lemon-yellow and stark white light pierces through a turbulent sky made of swirling, interlocking ribbons of indigo, deep magenta, and midnight-blue glass. Crashing waves below are rendered as sharp, jagged shards of turquoise and seafoam glass exploding against the dark cliffs. The composition is defined by its stark contrast, where heavy, impenetrable shadows meet intense, radiant panes of color. The visual rhythm feels handcrafted and ancient, combining the raw, graphic energy of a traditional woodblock print with the luminous, spiritual glow of cathedral stained glass.

ERNIE Image Turbo