Sign In

Workflows: A Beginners Tutorial & Hands on Walkthrough Part 1

0

Apr 4, 2026

(Updated: 2 days ago)

workflows
Workflows: A Beginners Tutorial & Hands on Walkthrough Part 1

This part one of a series of articles I will write on this subject. This will be part 1 of 2. This will cover the introduction. Part 2 will be a hands on walkthrough.

Part 2 can be found here: https://civitai.com/articles/28171/workflows-a-beginners-tutorial-and-hands-on-walkthrough-part-2

Introduction:

I love building stuff. I'm an engineer and a carpenter and now build waterparks for a living. How things work and creating is my passion. I create workflows that get used by a lot of people and I get asked a lot of questions, especially about ComfyUi in general and how to build with them, so I thought I'd take the time to put together a few articles over the next few (weeks, month, however long) to help those underratnd the basics, the move on to more advanced workflows.

In this article, I'll go over the following:

  • A basic introduction to the comfyUI interface.

  • A walkthrough of the basic nodes and structures.

In part 2, I will go through the following:

  • A hands on walkthrough of wiring a basic workflow.

    • how nodes work

    • how to connect nodes

    • groups

    • generating your first image

    • 💡 tips & tricks

  • other stuff

The Comfyui Interface

Screenshot 2026-04-03 174125.png

Obviously, there are many parts and pieces to this, but I am going to go over the most used, essential parts.

Note: I use the Portable version, not Desktop, so some of this will look different

Workflow settings/ save/ tabs (upper left)

Screenshot 2026-04-03 174658.png
  • When you click on the upper lefthand Comfy Logo, this drop down appears. It's pretty self-explanatory.

  • Files & saving:

    Screenshot 2026-04-03 175739.png
    • it is important to understand that all of your workflows are saveing in the ComfyUI/workflow folder when you save it.

    • Export is used when you want to save a copy in another location (on portable, it automatically downloads to your download folder.

  • Save settings:

    Screenshot 2026-04-03 180154.png

    Under the "settings" tab and in the ⚙️ Comfy application, scroll down to workflow (close to the bottom)

    • Change your setting to "after delay". This makes sure that what you are working on is saved if you crash.

Comfy Manager

Screenshot 2026-04-03 174752.png
  • The main brains of the program. This is where you download your models and Custom Nodes

  • Custom Nodes Manager/ Install missing custom nodes

    Screenshot 2026-04-03 174813.png

    This is where all the custom nodes reside. This is where you update and find them.

    • Install missing custom nodes: This will show you the missing nodes in the workflow so that you can download them.

Screenshot 2026-04-03 183821.png
  • When loading a new workflow, hitting the "Fit to screen" button is very helful.

Run/ Queue area

Screenshot 2026-04-03 184335.png
  • Batch: How many times do you want it to run

  • Run: The drop down menu on the side gives you several options

    • Run on change: It will run every time you make a change (change image, change prompt, etc.)

    • Run instant: It runs every time you make a minor tweak (no....just no.)

Basic Nodes & Structure. A walkthrough:

In this area I will walk you through the basic nodes. I have attached an example workflow. I suggest you take the time to download it. Once downloaded, use the ComfyUi manager or the links I provided to download the models and node required. Don't worry if you don't get all of them. We'll discuss those issues later down the road.

workflow (2).png

See the example workflow. It is broken down into several categories. I will discuss these in length, but here is a quick breakdown

  • Models, checkpoints, clip text encoders, vae

  • Connectors & organizers

  • clip set last layer (sometimes called clip skip)

  • LoRa loaders

  • Flow and conditioning modifiers. (note: these are specific to certain types of models like Flux, Qwen, and Z-image, and their offspring)

  • Prompts and clip text encoders

  • Image Loading and latent images

  • Ksampler

  • VAE decode

  • Preview image

Here are other nodes that enhance your image:

  • Upscalers

  • Detailers

Models, clips, & Vaes:

Screenshot 2026-04-03 194207.png

Checkpoint loaders vs. diffusion model loaders:
Every model program has three parts; a diffusion model, a clip, & a VAE. A checkpoint is just all of those merged together for convenience (not always the best). We will discuss each sepoerately:

Grok was kind enough to explain them for me:

  • Diffusion Models (the heart of ComfyUI workflows): These are probabilistic models (typically a UNet architecture) that generate images by starting with pure Gaussian noise and gradually “denoising” it over many steps. In ComfyUI you wire the diffusion model (often labeled “KSampler” or “SamplerCustom”) to iteratively refine a noisy latent until it matches your prompt. The process is guided by conditioning (text, image, or control inputs) and controlled by parameters like steps, CFG scale, and scheduler.

  • CLIP (Contrastive Language-Image Pretraining): CLIP is the text encoder that translates your natural-language prompt into a numerical embedding the diffusion model can understand. In ComfyUI you load a CLIP model (e.g., CLIPTextEncode node) which produces two outputs—positive and negative conditioning—that steer the denoising process toward what you want and away from what you don’t. CLIP is the reason prompts work so effectively in Stable Diffusion pipelines.

  • VAE (Variational Autoencoder): The VAE acts as the image “compressor” and “decompressor.” It encodes full-resolution pixel images into a much smaller latent space where the diffusion model works efficiently, and it decodes the final latent back into a viewable image. In ComfyUI you connect VAE Encode (for input images or ControlNets) and VAE Decode (after sampling) nodes; swapping VAEs (e.g., SD1.5 vs. SDXL vs. custom fine-tunes) directly changes output quality, sharpness, and color fidelity without retraining the diffusion model.

Organizers/ connectors/ getset Nodes

Have you ever seen a typical workflow that looks like this?

Screenshot 2026-04-03 200303.png

These help organize your workflow to prevent "spaghetti"

image.png
  • Reroutes:

    Screenshot 2026-04-03 195626.png

    They act as guiders for your wiring. Only one input, but you can have numerous outputs

  • Getset Nodes (my favorite)

    Screenshot 2026-04-03 200629.png

    These allow you to jump from one place to another without having to connect a spaghetti each time. You can have as many getnodes as you want for each setnode. It's great for moving specific data back and forth (like images or other settings)

  • Anything everywhere:

    Screenshot 2026-04-03 201206.png

    This handy node will automatically connect everythingthat takes that type of spaghetti throughout the workflow.

    • Good for Models, clips, vae, latent, and some other specific items

    • Does not work for integers, float, string, and combo

    • ⚠️ where this is put in the workflow matters. for example, if you have items hooked up after it (like LoRa loaders, clip skips, controlnet, etc.) that modify the item before it reaches the Sampler, then it will not work or give you an error.

Clip set last layer (or clip skip)

image.png
  • The CLIP Set Last Layer node lets you control the processing depth of the CLIP text encoder by choosing which transformer layer to stop at when converting your prompt into text embeddings

    • .stop_at_clip_layer (integer, usually from -24 to -1):

      • -1 → Use all layers

      • -2 → Skip the very last layer

    • Each model and LoRa have different options. Pay attention to how the model is set up.

LoRA Loaders

  • Screenshot 2026-04-03 202933.png

    There are several different types. some will cross the model, others both the model and clip.

  • the strength deterrmines how much of an impact it has on the image.

    • strength_model Controls how much the LoRA modifies the diffusion model. This affects composition, details, style, anatomy, etc. Most of the visible impact usually comes from this.

    • strength_clip (or clip weight): Controls how the LoRA influences your text prompt (positive/negative prompts).

  • You will sometimes see LoRA loaders in series, and sometimes they will cross only the model.

  • You can use more than one LoRA at a time. keep in mind that they wil interect with each other. Further, too many will "overcook" the image, making it full of details and noise and just plain sloppy.

Flow and conditioning modifiers:

Screenshot 2026-04-03 205039.png

There are several different ways that the model is adjusted. These two are the most common.

  • Model Sampling Aura Flow uses a flow-matching architecture (similar to Flux but with its own optimizations). Standard diffusion sampling (like in SD 1.5 or SDXL) doesn't work optimally on it. The ModelSamplingAuraFlow node "patches" the loaded model to apply the correct internal sampling configuration. You'll find this most often in Z-Image or Qwen workflows

  • Flux Guidance is the Flux-specific equivalent of CFG Scale (Classifier-Free Guidance) in older Stable Diffusion models like SD 1.5 or SDXL

    • It scales the influnce of your prompt on the model. Higher numbers are more strict.

    • standard is 3.5

Prompts & Clip Text Encoders

image.png
  • This is where you enter your prompt that tells the model what to do.

  • Clip Text encoder:

    • It takes your text prompt as input.

    • Uses a CLIP model to analyze the meaning and context of the words.

    • Outputs conditionings (embeddings) that guide the diffusion process during image generation.

    • Think of the CLIP text encoder as the "translator" that turns your natural language description into a language the Stable Diffusion model speaks.

  • Positive prompt: There are two basic formats:

    • Danbooru Tags: use mainly in SDXL based models (Illustrious, Pony, etc.)

      • They look like this: "1girl, blonde hair, green eyes, holding, breasts, striped shirt, eating a cat, night, golden hour,"

      • Most of these will start with descriptors like "Masterpiece, photorealistic,8k, hires, highly detailed, score8_up"...

      • They can be weighted with parentheis as such (green eyes), or ((green eyes)), or you can manually weight them (green eyes:1.5)

    • Nautral Language: on Flow models like Flux, Zit, Anima

      • they look like this " Photorealistic image of a woman with blonde hair and piercing green eyes. She has large breasts and wearing a striped shirt. She is taking a selfie of her eating a cat. Night, golden hour lighting, blurred background"

      • These are typically not weighted as above.

      • 💡Note: It is very helpful to use an LLM to enhance your prompt to step up your game.

  • Negative prompt:

    • Not all models use them. CFG models (Classifier Free Guidance) like Flux and ZIT do not require them.

    • A normal negative prompt looks similar to this: "bad anatomy, blurry, extra finger, missing limbs, extra limbs, censored, jpg artifacts, watermark"

  • Use of seperate text strings to enter data:

    image.png

    Occasionally you will see the prompt being entered like this;

    • This allows the same prompt to be used in multiple places.

    • It also allows combining of prompts through a "join strings" or "concatanate text" node (not shown).

The latent Image:

image.png

Think of the latent image as the canvas that the Diffusion model paints on. It determine the size of the image. It can also be used to inject embeddings from an Image (I2I or image to Image) which some models (not all) will read and use as guidance.

Refer to the above image. I will describe all of the parts and pieces. note that most are not used at the same time so will not always apply to every workflow:

  • Load Image: pretty self explanatory. Pay attention to the dimensions at the bottom. Those are the original width & Height of the image you loaded. The ratio is sometimes critical (not so much the size, although the larger the size, the clearer the end result)

  • Empty Latent Image: this determines the height, width, and batch (how many images are created wit each run).

  • get resolution node: a very helpful node for resizing as it has a height and width output.

  • SDXL Empty Latent picker (or similar node): There are several different forms or premade aspect ratio nodes. These are incredibly helpful, especially for media posts as Inatsgram, etc. have preset sizes. These replace the Empty Latent Image node.

  • VAE encode: This nifty node take your image you loaded and turns it into a language that the diffusion model can understand.

  • Resize Image node (or similar): You will see these a lot when using masks, controlnet, upscalers, etc.

    • They resize the image in several different ways. Each has a beneft:

      • Upscale

      • Stretch

      • Crop

      • Pad

      • Resize

    • They also downscale the image to the proper size.

🧠 The Ksampler

image.png

This is the brains of the whole workflow. I'll walk through each part.

image.png
  • seed: this is the niose seed that the image is generated from.

  • control after generation: this is connected to the seed, it can be fixed, increment (it will move up one after evert generation), or random.

  • steps: This number varies between models. For example, Z-Image Turbo requires 9, flux worke between 20-35, Illustrious can go between 25-50.

    • the more steps, the more times the image is refined

  • cfg: This determines how closely the generator follows you prompt, or how creative it is. Lower numbers are rigid, higher numbers are more creative.

  • Samplers & Schedulers: These go hand in hand

    • Samplers

      • Sampler: The algorithm that determines how the model interprets noise predictions and denoises the latent image step-by-step.

      • Controls aspects like generation speed, detail sharpness, convergence (whether it stabilizes), and overall artistic style.

      • Common examples: Euler, DPM++ 2M, DPM++ 2M Karras (or SDE variants), Heun, LMS, UniPC, DDIM, LCM.

      • Ancestral samplers (often marked with "a" or "ancestral", e.g., Euler a, DPM++ 2M SDE) add randomness/noise at each step → more variation and creativity, but less deterministic/convergent.

      • Non-ancestral samplers tend to converge more predictably for consistent results.

    • Schedulers

      • Scheduler: Controls how the noise level (sigma values) changes across the total number of steps — essentially the "schedule" or curve of denoising intensity.

      • Affects smoothness, detail retention, efficiency, and how aggressively noise is removed early vs. late in the process.

      • Common examples: normal (balanced linear), karras (smooth, high-quality details), exponential (fast decay), simple, sgm_uniform, beta, ddim_uniform.

      • Pairings matter: Some samplers perform best with specific schedulers (e.g., Euler with normal or karras for sharp results; DPM++ variants often shine with karras or beta).

  • Denoise:

    • Determines how much of the original latent gets noised up (and thus how much the final image can deviate from the input).

      • denoise = 1.0 (default): Full denoising. The input latent is almost completely overwritten with new noise. This is ideal for text-to-image (txt2img) from an empty latent or when you want maximum creative freedom / big changes.

      • denoise < 1.0 (e.g. 0.6, 0.4, 0.2): Partial denoising. Only a portion of the input latent is replaced with noise, so the sampler preserves more of the original structure, composition, colors, and details.

    • Used heavily for image-to-image (img2img), refinement, upscaling, inpainting, or subtle style/pose adjustments.

For more information on the different Ksampler, please read my other article Here: https://civitai.com/articles/24343/a-ksampler-by-any-other-name

The Final Image

image.png

VAE decoder:

  • This takes the latent image created by the Ksampler and using the VAE (as dicsussed above) turns it back into an Image

  • VAE Decode (tiled): On computers with low VRam or when the image is very large (also for video), you may come across a VAE tiled node. what this does is break the image up into smaller parts to decode, then using the overlap, stiches them back together. It uses less VRam, but takes more 🕜

Preview Image:

  • Like it says, this node lets you preview the image.

    • ⚠️ the "preview Image" does not 💾 save the image

Save Image:

image.png

This does what it says. By default it saves in the "Output" subfolder of your ComfyUi folder

Other Common (Advanced) Nodes:

image.png

Upscaler:

  • There are several different versions of this. They perform several important functions:

    • Upscale: Increase the size of the image

    • Upscale by model: will add details (skin, sharpness, filters,etc.) as it upscales

Detailer:

  • There are several different ways this is done. It basically detects part of the image (face, eyes, hands) and corrects/ enhances it.

I will go more in depth on these in a future article.

Part 1 summary:

We have gone over all of the important sections of a workflow from getting Comfy started to what models do, the different types of nodes, the sampler settings, and the final image. I hope that this was helpful.

I have attached the example workflow for your reference. I will also be using it in the next part as a tutorial.

Please comment and let me know what you think

Instagram: https://www.instagram.com/synth.studio.models/

Buy me a☕ https://ko-fi.com/lonecatone

This represents Many of hours of work. If you enjoy it, please 👍like, 💬 comment , and feel free to ⚡tip 😉

0