home models images videos posts articles bounties challenges events updates shop

Ace Step 1.5 XL Turbo and SFT - TEXT to AUDIO model with Ollama

Name: Ace Step 1.5 XL Turbo and SFT - TEXT to AUDIO model with Ollama
Rating: 5 (39 reviews)
Author: tremolo28

1.4k

Updated: Apr 14, 2026

tool

ace step 1.5 xl acestep text to song ace step 1.5 text to audio

Download (23.64 KB)

Unverified: Scan requested

Other

Details

Type	Workflows
Stats	53 0
Reviews	Positive (3)
Published	Apr 14, 2026
Base Model	Other

1 File

About this version

tremolo28

V1.6 Ace Step 1.5. Turbo and SFT normal and XL model with Ollama.

updated the settings for XL models and added a 3rd System Prompt for tags to chose, with more descriptive song descriptions.

1.5 XL SFT pipeline now has an "Adaptive Projected Guidance" node and negative prompt.

V1.5 Ace Step 1.5. Turbo and SFT normal and XL model with Ollama:

setup to create up to 4 tracks in a run, 2x Ace1.5 and 2x Ace1.5 XL, each with Turbo and SFT model, to compare (can be individually switched on/off)
VAE changed to tiled Audio VAE decode, uses less Vram.

XL Models: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/tree/main/split_files/diffusion_models

XL Model merges (i.e. merge of turbo with SFT): https://huggingface.co/Aryanne/acestep-v15-test-merges/tree/main/xl

VAE and Textencoder same as Ace 1.5 (see download links below)

** See below some tips which model and settings to start with.

GGUF Models "normal" and XL: https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF/tree/main

V1.2 Ace Step 1.5. Turbo and SFT model with Ollama Text to Audio/Song (examples below)

small update to GUI, system prompts and SFT sampler "engine"

V1.0 Ace Step 1.5. Turbo and SFT model with Ollama Text to Audio/Song

Ace Step uses TAGS and LYRICS to create a song. These can be generated by Ollama or by own prompts.

Can use any Song, Artist as reference or any other description to generate tags and lyrics.
Will output up to two songs, one generated by Turbo model, the other by the SFT model (experimental).
Keyscales, bpm and song duration can be randomized.
able to use dynamic prompts.
creates suitable songtitle and filenames with Ollama.
Lora Loader included, hope to see some Loras soon!

Important: Do not use sage attention in your comfyui starting parameters, avoid --lowvram setting, as this might force Texencoder to run very slow on CPU instead of GPU. Recommend to toggle Link visibility to hide the wires.

Download Files:

Ace Step 1.5 TURBO model: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/tree/main/split_files/diffusion_models
Ace Step 1.5 SFT model: https://huggingface.co/ACE-Step/acestep-v15-sft/tree/main (download model.safetensor and rename it)
Textencoder: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/tree/main/split_files/text_encoders (Qwen_0.6b and Qwen_4b required, 1.7b is a smaller alternative to 4b)
VAE: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/tree/main/split_files/vae

Ollama Models, required for tags, lyrics and songtitle, you can choose 1,2 or 3 different models, tags and lyrics might need a bigger model >7b, songtitle can use a smaller model:

https://ollama.com/huihui_ai/qwen3-vl-abliterated (for tags and lyrics, able to use thinking)
https://ollama.com/artifish/llama3.2-uncensored (small and fast for songtitle and tags)
https://ollama.com/mirage335/Llama-3-NeuralDaredevil-8B-abliterated-virtuoso (allround model, fast, usable for tags, lyrics and songtitle, recommended)

Update 9th of Feb 26: Alternative Turbo and SFT Models :

Turbo continuous: https://huggingface.co/ACE-Step/acestep-v15-turbo-continuous/tree/main
SFT-Shift1: https://huggingface.co/ACE-Step/acestep-v15-turbo-shift1/tree/main
SFT-Shift3: https://huggingface.co/ACE-Step/acestep-v15-turbo-shift3/tree/main
Merges of SFT, Turbo and Base model: https://huggingface.co/Aryanne/acestep-v15-test-merges/tree/main

Which models to start with ?

My current choice for normal model: Turbo-SFT merge_ta_0.5 & SFT-Shift1, using these settings:
- Turbo-SFT_merge model with sampler: er_sde, scheduler: beta57 (or beta), 22 steps
- SFT-Shift1 model with sampler euler, scheduler: normal, 138 steps
XL Model settings:
- Turbo-SFT merge model: sampler: er_sde, scheduler: sgm_uniform, 42 steps
  - alternative: sampler: res_s2, scheduler beta57 (requires RES4LYF custom nodes)
- SFT model: sampler: er_sde, scheduler: simple, 28 steps, try with CFG >2 (might work better with a "Adaptive Projected Guidance" node)
Ollama Model: Llama-3-NeuralDaredevil-8b-abliterated

More infos on models see thread below in discussion.

Save Location:

📂 ComfyUI/
├── 📂 models/
│ ├── 📂 diffusion_models/
│ │ └── acestep_v1.5_turbo.safetensors
│ ├── 📂 text_encoders/
│ │ ├── qwen_0.6b_ace15.safetensors
│ │ └── qwen_4b_ace15.safetensors (or 1.7b)
│ └── 📂 vae/
│ └── ace_1.5_vae.safetensors

Custom Nodes used:

optional (use Beta57 scheduler for a bit more punch, requires RES4LYF): https://github.com/ClownsharkBatwing/RES4LYF

Examples various styles:

Ollama help:

Install Ollama from https://ollama.com/
download a model: Go to a model page, chose a model , then hit the copy button, i.e. https://ollama.com/mirage335/Llama-3-NeuralDaredevil-8B-abliterated-virtuoso
open terminal and paste the model name, i.e.: ollama run huihui_ai/qwen3-vl-abliterated
model will be downloaded and can be selected in green comfy node "Ollama Connectivity". Hit "Reconnect" to refresh.