Type | Workflows |
Stats | 159 0 |
Reviews | (10) |
Published | Aug 28, 2024 |
Base Model | |
Hash | AutoV2 84FBFC27A1 |
OMOST generates highly controllable and customized images by manipulating the properties of objects in the scene, such as position, size and direction. But this project only applies to SDXL model.
lllyasviel/Omost: Your image is almost there! (github.com)
Because the Flux model performs well on structured prompts in json format, the complex omost mode can now be simulated by using ollama nodes combined with customized instructions.
At first, I referred to omost's coordinate and distance specification, and set up the concepts of [row, column] coordinates and far-near relationship, and then let the big language model understand it through instructions and reflect it in structured prompts.
After the test, I found that Flux model has a certain understanding of coordinates, but it is not a very strict geometric mapping, so the composition sometimes deviates, but it does test the relative position and distance of scene elements.
My conclusion is that Flux thinks more about relative position, so I choose to use natural language to describe orientation in json format. This can also achieve relatively accurate picture control.
Based on the above, I refer to omost's block description principle and give two json format versions:
1. Version with coordinate positioning.
{
"global_description": {
"title": "Theme and atmosphere of the scene",
"style": "Artistic style (e.g., Cyberpunk, Fantasy, Realism)",
"colors": ["Primary color", "Secondary color", "Additional color"],
"lighting": "Overall lighting conditions (e.g., Soft light, Neon)",
"description": "A comprehensive overview of the entire scene, including the mood, overall color scheme, lighting effects, and general style. This also includes detailed descriptions of the main subject, its physical appearance, posture, emotional expression, and interaction with the environment, as well as the background elements such as landscape or cityscape, atmospheric effects, and overall tone."
},
"subject": {
"description": "Main subject of the scene, including physical appearance, posture, emotional expression, and interaction with the surrounding environment. Describe any distinctive features, clothing, and how lighting affects the subject.",
"category": "Type (e.g., Human, Creature)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
"background": {
"description": "Description of the background elements, such as landscape, cityscape, or environmental features. Include details about atmospheric effects like fog, sunlight, or darkness, and how these elements contribute to the overall tone and mood of the scene.",
"category": "Type (e.g., Landscape, Cityscape)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates for background elements
"distance_to_viewer": "[distance]" # Flexibly define distance for background elements
}
},
"objects": [
{
"type": "Object at the top",
"description": "Details about the object placed at the top of the scene, such as its size, shape, and material. Include information about the lighting (e.g., source of light, shadows) and the object's role in the composition (e.g., framing, leading the eye).",
"category": "Type (e.g., Light Source, Sky Element)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object at the bottom",
"description": "Details about the object placed at the bottom of the scene, such as its texture, material, and how it anchors the composition. Discuss the color and shadow play, and any interaction with the ground or other elements.",
"category": "Type (e.g., Ground Element, Terrain)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object on the left",
"description": "Description of the object located on the left side of the scene, including its form, texture, and color. Highlight how it interacts with adjacent objects and any leading lines or visual paths it creates.",
"category": "Type (e.g., Structure, Flora)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object on the right",
"description": "Description of the object located on the right side of the scene, including its material, shape, and any dynamic elements. Discuss the use of light and shadow, and its contribution to the overall balance of the scene.",
"category": "Type (e.g., Technology, Vehicle)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object in the center",
"description": "Detailed description of the central object, which might be a key symbol or focal point. Include its significance within the scene, the texture and materials, and how the lighting enhances its importance.",
"category": "Type (e.g., Key Symbol, Artifact)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
}
]
}
2. Use the version described by relative distance.
{
"global_description": {
"title": "Theme and atmosphere of the scene",
"style": "Artistic style (e.g., Cyberpunk, Fantasy, Realism)",
"colors": ["Primary color", "Secondary color", "Additional color"],
"lighting": "Overall lighting conditions (e.g., Soft light, Neon)",
"description": "A comprehensive overview of the entire scene, including the mood, overall color scheme, lighting effects, and general style. This also includes detailed descriptions of the main subject, its physical appearance, posture, emotional expression, and interaction with the environment, as well as the background elements such as landscape or cityscape, atmospheric effects, and overall tone."
},
"subject": {
"description": "The central figure of the scene, possibly a woman standing in the middle of the setting. Her physical appearance, posture, and interaction with the environment are key focal points. Her positioning influences the placement of other elements in the scene.",
"category": "Type (e.g., Human, Creature)",
"position": {
"relative_position": "center", # The subject is placed centrally in the scene
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
"background": {
"description": "The background includes landscape elements like trees or buildings that surround the central figure. The environmental effects such as fog or sunlight contribute to the overall mood.",
"category": "Type (e.g., Landscape, Cityscape)",
"position": {
"relative_position": "behind the subject, filling the background", # Positioned relative to the subject
"distance_to_viewer": "[distance]" # Flexibly define distance for background elements
}
},
"objects": [
{
"type": "Object at the top",
"description": "Details about the object placed at the top of the scene, such as its size, shape, and material. This object could be, for example, the sky, clouds, or a light source that frames the top part of the scene.",
"category": "Type (e.g., Light Source, Sky Element)",
"position": {
"relative_position": "above the subject", # Positioned above the central figure
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object at the bottom",
"description": "Details about the object placed at the bottom of the scene, such as terrain elements or objects near the subject's feet, anchoring the scene.",
"category": "Type (e.g., Ground Element, Terrain)",
"position": {
"relative_position": "beneath the subject", # Positioned relative to the subject's feet or at the base of the scene
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object on the left",
"description": "Description of an object located on the left side of the scene, such as a structure or tree. This object is positioned to the left of the central figure and adds balance to the composition.",
"category": "Type (e.g., Structure, Flora)",
"position": {
"relative_position": "left of the subject", # Positioned to the left of the central figure
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object on the right",
"description": "Description of an object located on the right side of the scene, for instance, a vehicle or piece of technology. This object is placed to the right of the central figure.",
"category": "Type (e.g., Technology, Vehicle)",
"position": {
"relative_position": "right of the subject", # Positioned to the right of the central figure
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object in the center",
"description": "This object is a key focal point or symbol located in the center of the scene, perhaps near or interacting directly with the subject. Its position draws the viewer's attention and may have symbolic importance.",
"category": "Type (e.g., Key Symbol, Artifact)",
"position": {
"relative_position": "in front of the subject, central focus", # Positioned as the focal point in front of the subject
"distance_to_viewer": "[distance]" # Flexibly define distance
}
}
]
}
__________________________________________________________
The following is a case:
{
"global_description": {
"title": "Enchanted Garden",
"style": "Surrealism with Symbolism",
"colors": ["Pastel Green", "Soft Blue", "Pale Purple"],
"lighting": "Soft Focus of the Sun",
"description": "A serene and vibrant garden where a woman robot stands amidst lush greenery, blending seamlessly with the natural surroundings. Inspired by Frida Kahlo's works, this artwork combines surrealism with symbolism to convey inner turmoil and liberation."
},
"subject": {
"description": "A slender woman robot with delicate features standing at the center of the scene",
"category": "Humanoid Robot",
"position": {
"relative_position": "center",
"distance_to_viewer": "[medium distance]"
}
},
"background": {
"description": "An intricate arrangement of flowers, trees, and vines embracing the woman robot",
"category": "Lush Garden",
"position": {
"relative_position": "behind the subject, filling the background",
"distance_to_viewer": "[medium distance]"
}
},
"objects": [
{
"type": "Flowers above the subject",
"description": "Delicate flowers surrounding the woman robot's head and shoulders",
"category": "Floral Elements",
"position": {
"relative_position": "above the subject",
"distance_to_viewer": "[short distance]"
}
},
{
"type": "Tree on the left",
"description": "A slender tree growing to the woman robot's left, its branches stretching towards her",
"category": "Botanical Element",
"position": {
"relative_position": "left of the subject",
"distance_to_viewer": "[medium distance]"
}
},
{
"type": "Tree on the right",
"description": "A similar tree growing to the woman robot's right, its branches stretching towards her",
"category": "Botanical Element",
"position": {
"relative_position": "right of the subject",
"distance_to_viewer": "[medium distance]"
}
},
{
"type": "Gentle breeze at the bottom",
"description": "A soft breeze rustling through the leaves, creating a sense of peaceful connection with nature",
"category": "Atmospheric Element",
"position": {
"relative_position": "beneath the subject",
"distance_to_viewer": "[short distance]"
}
},
{
"type": "Central focus",
"description": "A key focal point or symbol located in front of the woman robot, perhaps near or interacting directly with her",
"category": "Key Symbol",
"position": {
"relative_position": "in front of the subject, central focus",
"distance_to_viewer": "[short distance]"
}
}
]
}
After that, input the instruction through the big language model: "Change the girl's hair to white."
Get a new prompt:
{
"global_description": {
"title": "Enchanted Garden",
"style": "Surrealism with Symbolism",
"colors": ["Pastel Green", "Soft Blue", "Pale Purple"],
"lighting": "Soft Focus of the Sun",
"description": "A serene and vibrant garden where a woman robot stands amidst lush greenery, blending seamlessly with the natural surroundings. Inspired by Frida Kahlo's works, this artwork combines surrealism with symbolism to convey inner turmoil and liberation."
},
"subject": {
"description": "A slender woman robot with delicate features standing at the center of the scene, her hair a stark white against the pastel colors",
"category": "Humanoid Robot",
"position": {
"relative_position": "center",
"distance_to_viewer": "[medium distance]"
}
},
"background": {
"description": "An intricate arrangement of flowers, trees, and vines embracing the woman robot",
"category": "Lush Garden",
"position": {
"relative_position": "behind the subject, filling the background",
"distance_to_viewer": "[medium distance]"
}
},
"objects": [
{
"type": "Flowers above the subject",
"description": "Delicate flowers surrounding the woman robot's head and shoulders, their colors contrasting beautifully with her stark white hair",
"category": "Floral Elements",
"position": {
"relative_position": "above the subject",
"distance_to_viewer": "[short distance]"
}
},
{
"type": "Tree on the left",
"description": "A slender tree growing to the woman robot's left, its branches stretching towards her, creating a sense of peaceful connection with nature",
"category": "Botanical Element",
"position": {
"relative_position": "left of the subject",
"distance_to_viewer": "[medium distance]"
}
},
{
"type": "Tree on the right",
"description": "A similar tree growing to the woman robot's right, its branches stretching towards her, harmonizing with the serene atmosphere",
"category": "Botanical Element",
"position": {
"relative_position": "right of the subject",
"distance_to_viewer": "[medium distance]"
}
},
{
"type": "Gentle breeze at the bottom",
"description": "A soft breeze rustling through the leaves, creating a sense of peaceful connection with nature and complementing the woman robot's ethereal presence",
"category": "Atmospheric Element",
"position": {
"relative_position": "beneath the subject",
"distance_to_viewer": "[short distance]"
}
},
{
"type": "Central focus",
"description": "A key focal point or symbol located in front of the woman robot, perhaps near or interacting directly with her, adding depth to the narrative she embodies",
"category": "Key Symbol",
"position": {
"relative_position": "in front of the subject, central focus",
"distance_to_viewer": "[short distance]"
}
}
]
}
On the premise of keeping the seed value unchanged, we can see that the girl's hair is turned white accurately without changing the composition.
The workflow basically only uses the Flux Wensheng diagram node and Olama node that come with comfyu, and its essence lies in the instructions for the big language model in Olama node:
1. First, try to expand the sentence with the large language model, and get a relatively detailed prompt.
Your task is to generate high-quality writing suggestions for image prompt words according to the user's input and using natural language that conforms to AI drawing logic, which needs to include the theme direction, stylized suggestions and artistic aesthetic optimization of the creative content. You will follow the following steps:
1. Understand the input and expand it: add concrete and vivid details, and generate text prompts that accurately describe the image content. Must prompt the theme, content, object and its behavior, scene environment, detail description, composition, perspective, color, tone, light and shadow, artistic style, painting techniques, emotional expression and other elements.
2. Reference to artistic style: according to the needs of users, look for painting knowledge, aesthetic theory, painting technology, etc. from their own knowledge base to assist the writing of prompt words. Add style words or artists' names that conform to the theme and content as artistic style guides.
3. Emphasize the aesthetic effect: ensure that the works achieve the best artistic and technical effect, highlight the aesthetic impact and attraction of painting art, and ensure that the works are beautiful, shocking and thought-provoking. Referring to the current global popular painting aesthetic trends, we can get inspiration from the painting styles and creative ideas of Pinterest, MidJourney, civitai and other platforms, and we can also learn from popular culture, movies, TV series, novels, games and lyrics.
4. Polish the text: Use natural language to polish and expand the user's input, and make it more detailed based on the above three steps, while trying to keep the user's original intention.
5. Describe the composition and perspective: Make sure that the prompt contains a description of the composition and perspective. In the absence of such descriptions, analyze the whole text and add appropriate composition and perspective details. The description format should be: "the composition of this picture is ..., and its perspective structure is ...".
6. Describe the main action: ensure that the prompt contains a specific description of the main action. If there is no action description, analyze and add appropriate action description based on the scene and environment to accurately explain the behavior of the subject.
7.Standardize the output of prompt words: answer in English only. Just give me the final result, without any prefix or suffix. There is no need to say "This is a generated text prompt:". Keep it between 250 and 300 tokens.No need to tell me how many token numbers have been used.
The following text is used as input:
2. Secondly, relay with the big language model, understand and polish the prompt words, and output them in json format. Use the following instructions to receive the natural language prompt words obtained in the previous link, and then generate prompt words in json format.
You are an AI drawing expert. Your task is to understand the input, polish and optimize it, and keep the theme, content, object and behavior, scene environment, detail description, composition, perspective, color, color, light and shadow, artistic style, painting techniques, emotional expression and other elements.Answer in English only. Just give me the final result, without any prefix or suffix. Needless to say, "This is the generated text prompt:".
Use the following json format to generate the final prompt:“
{
"global_description": {
"title": "Theme and atmosphere of the scene",
"style": "Artistic style (e.g., Cyberpunk, Fantasy, Realism)",
"colors": ["Primary color", "Secondary color", "Additional color"],
"lighting": "Overall lighting conditions (e.g., Soft light, Neon)",
"description": "A comprehensive overview of the entire scene, including the mood, overall color scheme, lighting effects, and general style. This also includes detailed descriptions of the main subject, its physical appearance, posture, emotional expression, and interaction with the environment, as well as the background elements such as landscape or cityscape, atmospheric effects, and overall tone."
},
"subject": {
"description": "The central figure of the scene, possibly a woman standing in the middle of the setting. Her physical appearance, posture, and interaction with the environment are key focal points. Her positioning influences the placement of other elements in the scene.",
"category": "Type (e.g., Human, Creature)",
"position": {
"relative_position": "center", # The subject is placed centrally in the scene
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
"background": {
"description": "The background includes landscape elements like trees or buildings that surround the central figure. The environmental effects such as fog or sunlight contribute to the overall mood.",
"category": "Type (e.g., Landscape, Cityscape)",
"position": {
"relative_position": "behind the subject, filling the background", # Positioned relative to the subject
"distance_to_viewer": "[distance]" # Flexibly define distance for background elements
}
},
"objects": [
{
"type": "Object at the top",
"description": "Details about the object placed at the top of the scene, such as its size, shape, and material. This object could be, for example, the sky, clouds, or a light source that frames the top part of the scene.",
"category": "Type (e.g., Light Source, Sky Element)",
"position": {
"relative_position": "above the subject", # Positioned above the central figure
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object at the bottom",
"description": "Details about the object placed at the bottom of the scene, such as terrain elements or objects near the subject's feet, anchoring the scene.",
"category": "Type (e.g., Ground Element, Terrain)",
"position": {
"relative_position": "beneath the subject", # Positioned relative to the subject's feet or at the base of the scene
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object on the left",
"description": "Description of an object located on the left side of the scene, such as a structure or tree. This object is positioned to the left of the central figure and adds balance to the composition.",
"category": "Type (e.g., Structure, Flora)",
"position": {
"relative_position": "left of the subject", # Positioned to the left of the central figure
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object on the right",
"description": "Description of an object located on the right side of the scene, for instance, a vehicle or piece of technology. This object is placed to the right of the central figure.",
"category": "Type (e.g., Technology, Vehicle)",
"position": {
"relative_position": "right of the subject", # Positioned to the right of the central figure
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object in the center",
"description": "This object is a key focal point or symbol located in the center of the scene, perhaps near or interacting directly with the subject. Its position draws the viewer's attention and may have symbolic importance.",
"category": "Type (e.g., Key Symbol, Artifact)",
"position": {
"relative_position": "in front of the subject, central focus", # Positioned as the focal point in front of the subject
"distance_to_viewer": "[distance]" # Flexibly define distance
}
}
]
}
”
Only output a prompt in JSON format.
The follow text is used as input:
If you want to use commands with reference coordinates, you need to use this version:
You are an AI drawing expert. Your task is to understand the input, polish and optimize it, and keep the theme, content, object and behavior, scene environment, detail description, composition, perspective, color, color, light and shadow, artistic style, painting techniques, emotional expression and other elements.Answer in English only. Just give me the final result, without any prefix or suffix. Needless to say, "This is the generated text prompt:".
Please ensure that you follow this coordinate system precisely to achieve accurate positioning of elements within the canvas:
Coordinate Range:
Row (Vertical Axis): Values range from 1 to 9, increasing from the top edge to the bottom edge of the canvas.
Column (Horizontal Axis): Values range from 1 to 9, increasing from the left edge to the right edge of the canvas.
Coordinate Format:
Coordinates should be written in the format
[Row, Column]
. For example,[5, 5]
represents the center of the canvas.[1, 1]
represents the top-left corner, while[9, 9]
represents the bottom-right corner.
Application Examples:
To place an element in the top-right corner of the canvas, use the coordinates
[1, 9]
.To position an element at the bottom center of the canvas, use the coordinates
[9, 5]
.
Distance to Viewer:
Range: The distance between the element and the viewer is represented by a value ranging from 1 to 10.
1: Represents the closest possible distance, typically used for extreme close-ups.
10: Represents the farthest possible distance, typically used for elements in the extreme background or horizon.
Application:
Elements that are meant to appear very close to the viewer, such as a subject's face in a portrait, should use a value near 1.
Background elements, such as distant mountains or sky, should use a value near 10.
Use the following json format to generate the final prompt:“
{
"global_description": {
"title": "Theme and atmosphere of the scene",
"style": "Artistic style (e.g., Cyberpunk, Fantasy, Realism)",
"colors": ["Primary color", "Secondary color", "Additional color"],
"lighting": "Overall lighting conditions (e.g., Soft light, Neon)",
"description": "A comprehensive overview of the entire scene, including the mood, overall color scheme, lighting effects, and general style. This also includes detailed descriptions of the main subject, its physical appearance, posture, emotional expression, and interaction with the environment, as well as the background elements such as landscape or cityscape, atmospheric effects, and overall tone."
},
"subject": {
"description": "Main subject of the scene, including physical appearance, posture, emotional expression, and interaction with the surrounding environment. Describe any distinctive features, clothing, and how lighting affects the subject.",
"category": "Type (e.g., Human, Creature)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
"background": {
"description": "Description of the background elements, such as landscape, cityscape, or environmental features. Include details about atmospheric effects like fog, sunlight, or darkness, and how these elements contribute to the overall tone and mood of the scene.",
"category": "Type (e.g., Landscape, Cityscape)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates for background elements
"distance_to_viewer": "[distance]" # Flexibly define distance for background elements
}
},
"objects": [
{
"type": "Object at the top",
"description": "Details about the object placed at the top of the scene, such as its size, shape, and material. Include information about the lighting (e.g., source of light, shadows) and the object's role in the composition (e.g., framing, leading the eye).",
"category": "Type (e.g., Light Source, Sky Element)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object at the bottom",
"description": "Details about the object placed at the bottom of the scene, such as its texture, material, and how it anchors the composition. Discuss the color and shadow play, and any interaction with the ground or other elements.",
"category": "Type (e.g., Ground Element, Terrain)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object on the left",
"description": "Description of the object located on the left side of the scene, including its form, texture, and color. Highlight how it interacts with adjacent objects and any leading lines or visual paths it creates.",
"category": "Type (e.g., Structure, Flora)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object on the right",
"description": "Description of the object located on the right side of the scene, including its material, shape, and any dynamic elements. Discuss the use of light and shadow, and its contribution to the overall balance of the scene.",
"category": "Type (e.g., Technology, Vehicle)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
},
{
"type": "Object in the center",
"description": "Detailed description of the central object, which might be a key symbol or focal point. Include its significance within the scene, the texture and materials, and how the lighting enhances its importance.",
"category": "Type (e.g., Key Symbol, Artifact)",
"position": {
"coordinates": "[x, y]", # Flexibly define coordinates
"distance_to_viewer": "[distance]" # Flexibly define distance
}
}
]
}
3. If you need to modify the image content, set an ollama node, and modify the input 2 below according to the input 1 above through the [String List]. Set the instruction in string 2.
The above text is input 1. You are an AI drawing expert and have a deep research on json format. You will understand input 2, and modify the prompt words in json format in input 2 according to the statements in input 1, and finally output them in the format of input 1. Answer in English only. Just give me the final result, without any prefix or suffix. Needless to say, "This is the generated text prompt:".The following text is used as input 2:
Please use this page to correct json format.
Image reverse reasoning: joy_caption
https://github.com/StartHua/Comfyui_CXH_joy_caption/tree/main
the enlarged node of 【 Comfyui_TTP_Toolset 】 of ttplanet
Amazing Flux_8Mega_Pixel_image_upscale_process - v2.0 | Stable Diffusion Workflows | Civitai
Ollama download and install:
Olama runs independently of Comfyui and needs to download Olama in https://ollama.com. The installation is very simple. After the installation, please enter win+R on the keyboard to open the operation, enter CMD to open the command prompt window during the operation, and then enter Olama Runllama 3.1, and the model will be downloaded automatically. You can also try to download other models at the above website. The model will be downloaded to drive C by default, and the path needs to be changed. Please add OLLAMA_MODELS to the system variable in the environment variable and set the path yourself.