Sign In

Ace-Step 1.5 XL, Song + Lyrics + Album Cover + Video Publisher - All in one workflow

Updated: Apr 13, 2026

base modelz-imageace step 1.5

Download

1 variant available

Archive Other

33.62 KB

Verified:

Type

Workflows

Stats

565

Reviews

Published

Feb 6, 2026

Base Model

ZImageTurbo

Hash

AutoV2
961FBF040B

You need to have started your server WITHOUT SageAttention.

This workflow uses the latest Ace-Step 1.5 and XL models to create songs, this workflow has 5 stages wich can be toggled On or Off to suit your needs.

The Non XL Models require less steps. My best results are from:

Turbo with 8-12 steps.

XL, Turbo 20-30 steps.

The publishing part of this workflow wil publish your static thumbnail vids with audio at around 5MB FileSize, wich is perfect for social media or personal playlists.

Be it just for image generation or song lyric prompt generation The Prompt Generation Part could help you Generate a strong song Pre-process..

The prompt Generation Process can also generate prompts specialized in Visualised song album covers.

And finally it can merge all that into a video with a Album cover as a static thumbnail.

The Ace-Step1.5 Models i used are:

acestep_v1.5_turbo.safetensors

qwen_0.6b_ace15.safetensors

qwen_4b_ace15.safetensors

But feel free to use the

ace_step_1.5_turbo_aio.safetensors

I personally like the output from the large Qwen 4b clip encoder the best as it listens better to the Preprocess prompt provided.

The Prompt generator uses Qwen VL 4B Instruct, with a custom tailord system prompt for both song and Album cover Prompt generation. In my case i use the Abliterated version so it can output both SFW and NSFW lyrics or images.

I run this on a Nvidia RTX 5090 and each stage takes about 3-5 seconds to generate. But for lower end machines you can use the AIO model loader instead of the ones i have hooked up.

This is supposed to be a beginner level workflow and all the math functions are automatically calculated like time in seconds/minutes/ frame count. now framecount doesn't reallly matter in vids with only audio and static image so that's why i set the FPS to 16 or even Lower.

The Group bypasser toggle is programmed to have only 1 active at max, so you cannot generate a Qwen vl prompt + Ace song + Z image rendering, this is to avoid memory issues . Also important its adviced to turn off SageAttention run the server without the --use-sage-attention

A new version is coming up with some cool features!

Feel free to post your comments or awesome songs and album covers.

You need to have started your server WITHOUT SageAttention