Sign In

Ace Step 1.5 Turbo and SFT - TEXT to AUDIO model with Ollama

Type

Workflows

Stats

184

0

Reviews

Published

Feb 8, 2026

Base Model

Other

Hash

AutoV2
48041BFEB7
Howling Aurora
tremolo28's Avatar

tremolo28

Ace Step 1.5. Turbo and SFT model with Ollama Text to Audio/Song (examples below)

Ace Step uses TAGS and LYRICS to create a song. These can be generated by Ollama or by own prompts.

  • Can use any Song, Artist as reference or any other description to generate tags and lyrics.

  • Will output up to two songs, one generated by Turbo model, the other by the SFT model (experimental).

  • Keyscales, bpm and song duration can be randomized.

  • able to use dynamic prompts.

  • creates suitable songtitle and filenames with Ollama.

  • Lora Loader included, hope to see some Loras soon!

Important: Do not use sage attention in your comfyui starting parameters, avoid --lowvram setting, as this might force Texencoder to run very slow on CPU instead of GPU.


Download Files:

Ollama Models, required for tags, lyrics and songtitle, you can choose 1,2 or 3 different models, tags and lyrics might need a bigger model >7b, songtitle can use a smaller model:


Update 9th of Feb 26: Alternative Turbo and SFT Models :


Which models to start with? => Turbo, SFT-Shift1 and Llama3-NeuralDaredevil for Ollama.


Save Location:

  • 📂 ComfyUI/

  • ├── 📂 models/

  • │ ├── 📂 diffusion_models/

  • │ │ └── acestep_v1.5_turbo.safetensors

  • │ ├── 📂 text_encoders/

  • │ │ ├── qwen_0.6b_ace15.safetensors

  • │ │ └── qwen_4b_ace15.safetensors (or 1.7b)

  • │ └── 📂 vae/

  • │ └── ace_1.5_vae.safetensors


Custom Nodes used:

optional (use Beta57 scheduler for a bit more punch, requires RES4LYF): https://github.com/ClownsharkBatwing/RES4LYF


Examples various styles: