Hide Invisible Text Inside Any Audio File with Orion 4D in ComfyUI

You generated a voiceover. A music track. A cloned voice. You want to tag it with your origin info before it goes out into the world, invisibly, inaudibly, permanently.

Upload audio. Embed your text. Get two watermarked copies back. Listeners hear nothing different.

Run it now on Floyo!

How It Works

Upload your audio file. Type the text you want hidden inside it. The workflow produces two watermarked copies in one run, each using a different method, each inaudible, each recoverable later with the decoder.

Method 1: Steganography Hides text directly in the sample bits of the audio. High capacity a 20-second clip holds around 100KB of text. Enough for full metadata, attribution strings, or structured data. Survives as long as the file stays WAV. MP3 re-encoding wipes it out.

Method 2: Frequency Watermarking Tucks a quiet signal at 14000Hz, above most music content, inaudible to listeners. Carries less data than steganography but survives MP3 compression, re-encoding, and format conversion. Use this when the file is going out into the world and you need the mark to stick.

Both outputs sound identical to the source. Anyone running the decoder can pull your text back out.

Key Inputs

Input Audio

WAV, MP3, or FLAC. Longer files hold more hidden text. The workflow shows your exact capacity on upload.

Text to Embed

Whatever you want hidden. Short strings work for provenance tags like "Originally generated by Floyo — [your name] — [date]". Longer strings work for full metadata, source attribution, or structured data. Stay under the capacity shown at the top of the workflow.

Steganography Density

High Density (2 bits per sample): default, doubles your capacity
1 bit: lower capacity, slightly lower detectability

Use high density for most use cases. Drop to 1 bit if detectability is a concern.

Frequency Mode

Default is Stealth High at 14000Hz. Works for voice and music. The encoder and decoder settings must match, if they don't, the text won't come back out on decode.

Volume (Frequency Encoder)

Default 0.8.

Push toward 1.0 for survival through heavy compression and low-bitrate MP3
Drop to 0.5 or lower for maximum inaudibility on headphones

Louder survives more. Quieter hides better. Test against the compression level your audio will face before publishing.

Which Output to Use

Use the steganography output when:

the file stays in WAV format
you need maximum text capacity
you're tagging files for internal tracking where format won't change

Use the frequency output when:

the file will be converted to MP3, AAC, or uploaded to any platform
you need the watermark to survive re-encoding
you're publishing AI-generated audio and want provenance to stick

When in doubt, use both. They're generated in the same run.

What This Is Great For

AI-generated audio provenance: Tag TTS output, music model results, and voice cloning before publishing. Gives you a way to prove origin later as AI-generated audio becomes harder to distinguish from real recordings.

Version tracking: Embed different strings in different versions of the same file to track which copy went where.

Content attribution: Embed creator name, date, model used, and platform in the audio file itself. The metadata travels with the sound even after re-upload and re-compression.

Licensing and rights management: Tag audio assets with license information that stays in the file regardless of where it ends up.

What to Watch Out For

Steganography does not survive MP3 conversion. If your audio is going to the web, a podcast platform, or any service that re-encodes on upload, use the frequency-encoded output. The steganography output is for WAV-only workflows.

Encoder and decoder frequency modes must match. If you encoded with Stealth High, you must decode with Stealth High. Mismatched settings return nothing.

This is a provenance tool, not a security tool. Determined removal of any watermark is possible. It's designed for tagging and attribution, not tamper resistance.

High-volume watermarks at maximum density with low frequency encoder volume can be detected with audio analysis tools. For sensitive use cases, test your output against the analysis tools your audience might use.