Gemini Gems for Z-Image Prompt Generation

If you’ve been following my recent work, you know I’ve been diving deep into the Z-Image Turbo model. It is an incredibly fast and efficient image generation model, but like all AI models, it sings when you know exactly how to talk to it.

I previously shared a Z-Image Prompt Guide and released a set of custom ComfyUI nodes that automate the prompting process. While those nodes are powerful, I realized that not everyone wants to install custom nodes or mess with complex workflows just to get a good prompt. Sometimes, you just want to have a conversation, refine an idea, and get the perfect text to copy-paste into your generator.

That is where Gemini Gems come in.

I’m going to show you how to take that same Z-Image expertise and package it into a custom Gemini Gem. This allows you to easily generate high-quality Z-Image Turbo prompts through a natural chat interface that doesn't get in the way of your ComfyUI usage.

What is a Gemini Gem?

Think of a "Gem" as a version of Gemini that has been set up for a specific job. Instead of explaining how to write a Z-Image prompt every time you start a new chat, you give the Gem those instructions once. From then on, every time you open that Gem, it already knows the rules, the keywords, and the structure you need.

Why Use a Gem Instead of the ComfyUI Nodes?

While my ComfyUI nodes are great for automation, using a Gem offers a few distinct advantages:

Conversational Refinement: You can go back and forth. If the prompt isn't quite right, you can say, "Make it spookier" or "Change the lighting to sunset," and the Gem will update the prompt while keeping the Z-Image formatting intact.
Zero Setup: Once created, it’s available on your phone, tablet, or desktop instantly.
Conversation Persistence: Not only is there a chat session that let's you have a natural conversation, but it's persisted between sessions too! So you can revisit conversations and pick them up again later.

How to Create Your "Z-Image Assistant" Gem

Creating this Gem is surprisingly simple. You are essentially taking the "System Prompt" logic from my ComfyUI nodes and giving it to Gemini directly. Of course you'll need a Google account for this.

Open Gemini and look for the Gem Manager (usually on the left sidebar).
Click "New Gem".
Name your Gem: something like Z-Image Prompt Pro.
Instructions: This is the secret sauce. You will want to copy and paste the text from the Z-Image Prompt Guide. However, you want to adjust it a little bit to frame it properly.
- Ensure the instructions explicitly state: "Your goal is to take a user's natural language description and convert it into a strictly formatted text prompt optimized for the Z-Image Turbo model, following these guidelines..."
- Make sure that you provide very clear instruction to return text. I had to emphasize this because Gemini wanted to use Nano Banana to actually generate images every now and then. I added; DO NOT generate an image. Do not use nano banana. You must return text only. It is IMPORTANT to return text only.
- You will also want to ensure that you provide instruction to not return any references to the filenames of images that may have been uploaded to Gemini. Otherwise if you do provide a reference image, the prompt itself will sometimes say "that looks like image_1.png" or something like that. Obviously when you put that into ComfyUI, it won't know what "image_1.png" is.
Click Save.

Now, whenever you want to create an image, just open this Gem, type "A cat sitting on a neon fence," and it will spit out the technically optimized prompt for you to use.

If it needs more detail, then it will also ask questions from you to generate the prompt. For example, it may ask questions about the image style; photographic, 3d rendering, illustration, etc. It may also ask about the image composition and if it's a close-up shot or not.

Note: If you see something you don't like, you can go back and edit your Gem. This will let you do some context engineering to get things just right.

Advanced Example: Creating a Stylized "Noir-Ink" Gem

One of the best features of Gems is that you can "fork" them to create specific artistic personalities. I personally keep two versions of this Gem. The first is the general one above. The second is what I call "Noir-Ink."

For this Gem, I used the exact same Z-Image instructions, but I added a "Master Rule" at the very top of the instructions:

"Regardless of the user's input, ALWAYS structure the prompt to generate an illustration in a dark, gritty, pen-and-ink style. Use keywords like 'ink wash,' 'hatching,' 'high contrast,' and 'noir atmosphere.' If the user uploads a photograph, describe it as if it were a drawing in this style."

This is perfect for maintaining a consistent portfolio. I don’t have to keep reminding the AI to "make it look like a drawing" because the Gem just does it automatically.

I've attached my "Noir-Ink" Gem instructions to this article so you can reproduce the same. The cover image for this article used this Gem along with my Wet Ink LoRA which really helps for that style.

The Multimodal Workflow: From Image to Prompt

This is the feature that makes the chat interface superior to static nodes. Sometimes, you can’t describe what you want in a single request, but you have a picture that captures the vibe.

With your Z-Image Gem open, you can:

Drag and Drop an image into the chat window.
Type a simple instruction like: "Write a Z-Image prompt based on the composition of this photo, but change the subject to a futuristic robot."

The Gem acts as your translator. It "sees" the image, analyzes the lighting, camera angle, and composition, and then translates those visual elements into the technical text keywords that Z-Image Turbo needs to replicate that look.

While you can provide image input with the custom nodes, often times you need a conversation. The response from the custom nodes (from Gemini) will be questions. You can then copy and paste those into your prompt input and execute another run of the ComfyUI workflow to answer the questions. However, this is a new chat session with Gemini's API. Which is ok, you've kept all of the relevant information, but the user experience isn't as great when you're copying and pasting the chat conversation like that.

Summary

If you are using Z-Image Turbo, you don't need to struggle with the prompting curve alone. By spending five minutes setting up a Gemini Gem, you get an AI partner that knows the documentation better than you do, ready to brainstorm 24/7.

Step 1: Copy the Z-Image Guide text (or the markdown file I've attached to this article).
Step 2: Paste it into a new Gemini Gem.
Step 3: Start chatting (or dropping images) to create your best work yet.

The best part is that you can edit your Gems later if you find that you need to make refinements. You can even send Gemini example images that were generated from the prompts and explain what you like or don't like about them to help refine the Gem to provide a better prompt in the future.