Sign In

svdq-int4_r64-ernie-image

Updated: May 20, 2026

base modelsvdq

Download

1 variant available

nf4 SafeTensor

4-bit normalized • 4.53 GB

Verified:

Type

Checkpoint Trained

Stats

19

Reviews

Published

May 20, 2026

Base Model

Ernie

Hash

AutoV2
E9F86F31B1

License:

# ERNIE Image Turbo — Nunchaku W4A4 Quantized Inference

[中文](#chinese) | [English](#english)

---

### Introduction

This adds **W4A4 quantized inference** support for [ERNIE Image Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo) to [**Nunchaku**](https://github.com/nunchaku-ai/nunchaku), delivering significant speedup and memory reduction with minimal quality loss.

Built on [Nunchaku](https://github.com/nunchaku-ai/nunchaku). We gratefully acknowledge their excellent work on efficient diffusion model inference.


### Installation

```bash
# This fork adds ERNIE Image support to Nunchaku
git clone https://github.com/Hzj199/nunchaku.git
cd nunchaku
git submodule update --init --recursive

pip install build
NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
pip install dist/nunchaku-*.whl
```

### Quick Start

```python
import torch
from diffusers.pipelines.ernie_image.pipeline_ernie_image import ErnieImagePipeline
from nunchaku import NunchakuErnieImageTransformer2DModel
from nunchaku.utils import get_precision

precision = get_precision()  # auto-detect: "int4" or "fp4"
rank = 64

transformer = NunchakuErnieImageTransformer2DModel.from_pretrained(
    f"ZJMuYun97/ERNIE-Image-Nunchaku/svdq-{precision}_r{rank}-ernie-image.safetensors",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

pipe = ErnieImagePipeline.from_pretrained(
    "baidu/ERNIE-Image-Turbo",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    pe=None, pe_tokenizer=None,
)

image = pipe(
    prompt="a cute orange cat sitting on a sunlit windowsill",
    height=1024, width=1024,
    num_inference_steps=8,
    guidance_scale=1.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("ernie-image.png")
```

### Performance (Reference)

Tested on a single A800 GPU, 1024×1024 resolution, 8 inference steps:

| Model | Avg Latency | Speedup |
|-------|-------------|---------|
| Original BF16 | 4.89s | 1.0x |
| **Nunchaku W4A4** | **2.81s** | **1.74x** |

### Notes

- Only `batch_size=1` is supported (same as typical inference use case).
---


### 简介

为 [**Nunchaku**](https://github.com/nunchaku-ai/nunchaku) 添加了对 [ERNIE Image Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo) 的 **W4A4 量化推理**支持,在保持图像质量的前提下显著提升推理速度、降低显存占用。

本实现基于 [Nunchaku](https://github.com/nunchaku-ai/nunchaku),感谢其在高效扩散模型推理方面的出色工作。

### 安装

```bash
# 本 fork 基于 Nunchaku 添加了对 ERNIE Image 的支持
git clone https://github.com/Hzj199/nunchaku.git
cd nunchaku
git submodule update --init --recursive

pip install build
NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
pip install dist/nunchaku-*.whl
```

### 快速开始

```python
import torch
from diffusers.pipelines.ernie_image.pipeline_ernie_image import ErnieImagePipeline
from nunchaku import NunchakuErnieImageTransformer2DModel
from nunchaku.utils import get_precision

precision = get_precision()  # 自动检测:int4 或 fp4
rank = 64

transformer = NunchakuErnieImageTransformer2DModel.from_pretrained(
    f"ZJMuYun97/ERNIE-Image-Nunchaku/svdq-{precision}_r{rank}-ernie-image.safetensors",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

pipe = ErnieImagePipeline.from_pretrained(
    "baidu/ERNIE-Image-Turbo",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    pe=None, pe_tokenizer=None,
)

image = pipe(
    prompt="一只可爱的橘色猫咪坐在阳光照射的窗台上,旁边放着一盆绿色植物",
    height=1024, width=1024,
    num_inference_steps=8,
    guidance_scale=1.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("ernie-image.png")
```


### 性能参考

A800 单卡测试,1024×1024 分辨率,8 步推理:

| 模型 | 平均延迟 | 加速比 |
|------|---------|--------|
| 原始 BF16 | 4.89s | 1.0x |
| **Nunchaku W4A4** | **2.81s** | **1.74x** |

### 注意事项

- 仅支持 `batch_size=1`(符合常见推理场景)。