home models images videos 3D Models articles comics challenges updates shop

Ace Step 1.5 XL Turbo and SFT - TEXT to AUDIO model with Ollama

Name: Ace Step 1.5 XL Turbo and SFT - TEXT to AUDIO model with Ollama
Rating: 5 (71 reviews)
Author: tremolo28

2.6k

Updated: Jul 4, 2026

tool

acestep text to song ace step 1.5 text to audio ace step 1.5 xl

Download

1 variant available

Archive Other

AceStep_1_5_XL_V1_8.zip

28.2 KB

Verified: a day ago

Download (28.2 KB)

Details

Type

Workflows

Stats

Reviews

Positive

(3)

Published

Jul 3, 2026

Base Model

ACE Audio

Hash

AutoV2

E7FC15307F

About this version

default creator card background decoration

1.7K

3.2K

96.9K

tremolo28

Joined Dec 5, 2022

The Workflow was setup to have a clean "GUI" showing only parameters that matter, so you might want to toggle off Link visibility, like in above screenshot.

V1.8 Ace Step 1.5. Turbo and SFT normal and XL model with Ollama. Text to Audio/Song (examples below).

small update on Lora handling to better include Lora trigger words/phrases

Check this link with 1.5.Turbo XL Loras by Ryanontheinside: https://huggingface.co/ryanontheinside/models
1.5 XL Loras can be loaded on any XL Slot (Turbo, SFT or merges), it improve the sound very well, like more authentic guitars for rock&metal, etc. Also song structures follow the genre better.
Loras require a trigger word that can be found on the main page per Lora of above link. Use the trigger word in the new node "Pretext" within the Input section of the workflow, added around 25 of those triggers to a Note in the WF for copy&paste.
See post in discussion below for more info.

Key features:

Can use any Song, Artist as reference or any other description to generate tags and lyrics.
Will output up to 4 songs, each by Turbo, SFT, Turbo XL and SFT XL model (or any merge or Base model).
Keyscales, bpm and song duration can be randomized.
able to use dynamic prompts.
creates suitable songtitle and filenames with Ollama.
saves songs as MP3 with tags like lyrics, artist, genre, etc.

V1.7 Ace Step 1.5. Turbo and SFT normal and XL model with Ollama. Text to Audio/Song.

replaced the "save Mp3" nodes to allow ID3 tags to include data like artist name, bpm, genre, lyrics, etc.
- requires custom node: https://github.com/mattjohnpowell/comfyui-audio-expo
added a feature to save all relevant data (tags, lyrics, etc.) as a separate text (.txt) file
audio render processing remains unchanged to previous version

V1.6 Ace Step 1.5. Turbo and SFT normal and XL model with Ollama.

updated the settings for XL models and added a 3rd System Prompt for tags to chose, with more descriptive song descriptions.

1.5 XL SFT pipeline now has an "Adaptive Projected Guidance" node and negative prompt.

** See below some tips which model and settings to start with.

V1.5 Ace Step 1.5. Turbo and SFT normal and XL model with Ollama:

setup to create up to 4 tracks in a run, 2x Ace1.5 and 2x Ace1.5 XL, each with Turbo and SFT model, to compare (can be individually switched on/off)
VAE changed to tiled Audio VAE decode, uses less Vram.

V1.2 Ace Step 1.5. Turbo and SFT model with Ollama Text to Audio/Song

small update to GUI, system prompts and SFT sampler "engine"

V1.0 Ace Step 1.5. Turbo and SFT model with Ollama Text to Audio/Song

Ace Step uses TAGS and LYRICS to create a song. These can be generated by Ollama or by own prompts.

Avoid sage attention in your comfyui starting parameters, avoid --lowvram setting, as this might force Texencoder to run very slow on CPU instead of GPU.

Download Files:

Ace Step 1.5 TURBO model: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/tree/main/split_files/diffusion_models
Ace Step 1.5 SFT model: https://huggingface.co/ACE-Step/acestep-v15-sft/tree/main (download model.safetensor and rename it)
Ace Step 1.5 XL Models: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/tree/main/split_files/diffusion_models
Ace Step 1.5 XL Model merges (i.e. merge of turbo with SFT): https://huggingface.co/Aryanne/acestep-v15-test-merges/tree/main/xl
Textencoder: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/tree/main/split_files/text_encoders (Qwen_0.6b and Qwen_4b required, 1.7b is a smaller alternative to 4b)
VAE: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/tree/main/split_files/vae
- alternative VAE: https://huggingface.co/megagrump/Ace-Step-1.5-ScragVAE-ComfyUI

Ollama Models, required for tags, lyrics and songtitle, you can choose 1,2 or 3 different models, tags and lyrics might need a bigger model >7b, songtitle can use a smaller model:

https://ollama.com/mirage335/Llama-3-NeuralDaredevil-8B-abliterated-virtuoso (allround model, fast, usable for tags, lyrics and songtitle, recommended)
https://ollama.com/huihui_ai/qwen3-vl-abliterated (for tags and lyrics, able to use thinking)
https://ollama.com/artifish/llama3.2-uncensored (small and fast for songtitle and tags)

Alternative Turbo Models and merges (normal, non XL) :

Turbo continuous: https://huggingface.co/ACE-Step/acestep-v15-turbo-continuous/tree/main
Turbo-Shift1: https://huggingface.co/ACE-Step/acestep-v15-turbo-shift1/tree/main
Turbo-Shift3: https://huggingface.co/ACE-Step/acestep-v15-turbo-shift3/tree/main
Merges of SFT, Turbo and Base model: https://huggingface.co/Aryanne/acestep-v15-test-merges/tree/main

GGUF Models "normal" and XL: https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF/tree/main

Which models to start with ?

My current choice for normal model: Turbo-SFT merge_ta_0.5 & Turbo-Shift1, using these settings:
- Turbo-SFT_merge model with sampler: er_sde, scheduler: beta57 (or beta), 22 steps
- Turbo-Shift1 model with sampler euler, scheduler: normal, 138 steps
XL Model settings:
- XL Turbo-SFT merge model: sampler: er_sde, scheduler: sgm_uniform, 40 steps
  - alternative: sampler: res_s2, scheduler beta57 (requires RES4LYF custom nodes)
- XL SFT model: sampler: euler (or res_2s), scheduler: normal, 46 steps, CFG = 7.3, Adaptive Projected Guidance: eta = 1.05, norm_thresh= 1.3, momentum=0.0. Increase norm_thresh as the main parameter. These settings deliver "stabil" output for XL SFT,Base and their merges. The merges sound way better, pure SFT or Base introduce a lot of noise. I bypassed ModelSamplingAuraflow (see node next to model loader node). I think the base-turbo XL model merge fits well in that slot.
Disable "generate_audio_codes" in "TextEncodeAceStep" node to get different results, it works very well for many genres and reduces process time.
Ollama Model: Llama-3-NeuralDaredevil-8b-abliterated

More infos on models see thread below in discussion.

Save Location:

📂 ComfyUI/
├── 📂 models/
│ ├── 📂 diffusion_models/
│ │ └── acestep_v1.5_turbo.safetensors
│ ├── 📂 text_encoders/
│ │ ├── qwen_0.6b_ace15.safetensors
│ │ └── qwen_4b_ace15.safetensors (or 1.7b)
│ └── 📂 vae/
│ └── ace_1.5_vae.safetensors

Custom Nodes used:

optional (use Beta57 scheduler for a bit more punch, requires RES4LYF): https://github.com/ClownsharkBatwing/RES4LYF

Examples various styles:

Ollama help:

Install Ollama from https://ollama.com/
download a model: Go to a model page, chose a model , then hit the copy button, i.e. https://ollama.com/mirage335/Llama-3-NeuralDaredevil-8B-abliterated-virtuoso
open terminal and paste the model name, i.e.: ollama run huihui_ai/qwen3-vl-abliterated
model will be downloaded and can be selected in green comfy node "Ollama Connectivity". Hit "Reconnect" to refresh.