SDXL: Can it play in 14k? A demonstration of the capacity for tech merges in checkpoint development.

Introduction

In the rapidly evolving field of AI and machine learning, the quest for more sophisticated, high-resolution image generation has been relentless. The introduction of Stable Diffusion models marked a significant leap forward and further so with SDXL, bringing remarkable improvements in plain text comprehension and 1k resolution native image generation to consumer grade hardware. On those shoulders stands a plethora of technology that is yet to see full utilisation and implementation. This article explores some of the technology behind Aventis Horizon v2.

Aventis Horizon introduces a suite of enhancements that elevate it above conventional models. Central to its development is the use of for purpose trained Low-Rank Adaptations (LoRAs) and model manipulation technologies, which significantly augment its base capabilities, enabling the production of high fidelity landscape images at a native 14k resolution of 7168 x 2048 pixels. Beyond this impressive resolution, Aventis Horizon's advanced comprehension and pattern recognition abilities set new standards for what is achievable in SDXL based single pass image generation.

(4096 x 2024px, no assistance, model only)

Tech Merges: Enhancing Model Capability through Collaboration

At the core of Aventis Horizon's evolution are the "Tech LoRAs", each meticulously crafted to address and enhance specific operational aspects of the model. Some popular technologies that have propagated through the community as tech merges rather than retrained models include SDXL Turbo, SDXL Lightning, LCM (Latent Consistency Model), and DPO (Direct Preference Optimization).

This methodology allows you to apply a set of training weights to the model in a manner that can drastically shape the output capabilities. Aventis Horizon seeks to utilise these techniques to continue to enhance the capabilities of the SDXL platform while maintaining maximal compatibility with existing SDXL resources, starting in the following ways:

Style and Mixed Media: These LoRAs are specifically designed to enrich Aventis Horizon's ability to seamlessly blend various styles and media. This enhancement not only boosts the model's versatility but also elevates the dynamic nature of the images it generates, enabling the creation of visually compelling and diverse artwork that challenges the traditional boundaries of AI-generated content.
(Prompt: mixed media papercraft with liquid paint and , martial arts combat scene, spilled paint scene, comic style "POW" overlay)
Creativity & Prompt Alignment: Elevates Aventis Horizon's capability to interpret user prompts creatively, producing outputs that not only adhere closely to the specified criteria but also introduce an element of artistic innovation. This ensures a rich blend of accuracy and creativity, offering users images that surpass conventional expectations in both fidelity and artistic expression.
(Prompt: meerkat pirate, standing on an ornate jewelled box, proud expression, victory pose)
Emotive Responsiveness & Narrative Depth: Amplifies the model's proficiency in crafting images with a heightened sense of emotional resonance and narrative complexity. By fine-tuning its response to the emotive cues within prompts, Aventis Horizon delivers outputs that not only capture the aesthetic essence but also evoke a deeper emotional connection, enriching the viewer's experience with stories and sentiments that resonate on a personal level.
(Prompt: an expression of pensive disdain)
Pattern Recognition and Repetitive Structures: Dedicated LoRAs in this category enhance Aventis Horizon's proficiency in handling intricate patterns and repetitive structures. This capability is essential for generating images with a high level of detail and structural complexity, showcasing the model's advanced understanding of both natural and artificial forms.
(Prompt: hypermaximal fractal map)

The strategy behind tech merges involves leveraging LoRA-guided checkpoint merging, a technique that significantly bolsters the model's output capabilities. This process allows for substantial enhancements without the need for extensive retraining, streamlining the path towards model customization and iterative development. By emphasizing efficiency and adaptability, tech merges highlight a forward-thinking approach to model improvement.

Through the implementation of tech merges, Aventis Horizon exemplifies a sophisticated blend of technology and creativity, setting new standards for the capabilities of SDXL models.

The Challenge of Exceeding Training Resolutions

Stable diffusion models are trained on vast datasets of images at specific resolutions. This training process equips them with the ability to generate new images by understanding and replicating the patterns and textures found in their training data. However, when asked to produce images at resolutions higher than those in their training datasets, the models encounter a significant challenge.

The primary issue is that the model's understanding of patterns and textures is limited to what it has seen during training. At higher resolutions, the model must often generate more detail than it has learned, entering a realm of "untrained latent space." In this space, the model must make educated guesses to fill in details that were not present in its training data. In extreme circumstances a model can lose complete coherence and return nothing but noise.

The following examples are a comparison of the variety of coherence loss that can occur between models at extreme resolutions. These results do not indicate the quality of a model at it's native resolutions.

For the ease of testing, to reduce the number of variables, and to ensure that each model was being tested adequately, each of the following models were tested using a very basic workflow.

All tests were conducted with the following conditions unless otherwise stated, the original generations with embedded workflow data are provided in Attachments.

All images are generated:

in a single pass from a 7168 x 2048px empty latent (~14k pixels) on a 4090 24gb GPU
at 30 steps with a Classifier Free Guidance (CFG) score of 4.5
using the dpmpp_3M_sde_gpu sampler with karras scheduler
with a plaintext positive prompt passed to both the SDXL plaintext L-CLIP and the SD1.5 keyword G-CLIP, and with no negative prompt
with a fixed seed of 607325729159393
using a selection of popular and established community models.

Prompt: bold and strong street art style abstract cityscape illustration, masterpiece, urban themes, bright colours

AnythingXL:

Unstable Diffusers - Yamer Mix - NihilMania:

EpiCRealismXL V5-Ultimate:

ZavyChromaXL v6.0:

Aventis Horizon v2:

Tiling as a Coping Mechanism

One common strategy that stable diffusion models employ when generating images beyond their training resolutions is "tiling." Tiling involves replicating smaller sections of an image across a larger canvas to fill the space. This approach can produce coherent larger images but often at the cost of introducing noticeable repetitions and patterns that may not align with the natural or intended aesthetic of the image.

The reason for relying on tiling is primarily due to the model's inability to invent new details beyond the scope of the dataset it was trained on, however this tiling effect itself is hampered by a lack of pattern comprehension resulting in poor fidelity or, as demonstrated below, a reduced ability to fill the necessary latent space with detail.

Prompt: ethereal watercolour anime painting of an island forest with a township silhouetted against the background

EpiCRealismXL V5-Ultimate:

The Importance of Pattern Comprehension Development

The key to enabling stable diffusion models to generate high-resolution images beyond their training limits lies in enhancing their comprehension of pattern development. Models with a deeper understanding of how patterns form and evolve can more accurately extrapolate these patterns into the untrained latent space, creating images that maintain coherence and fidelity at higher resolutions.

Improving a model's comprehension of pattern development involves training it on a broader range of data or employing techniques that enhance its ability to replicate complex patterns. This could include integrating additional layers or mechanisms specifically designed to analyse and generate patterns, or employing advanced training techniques that encourage the model to learn more nuanced representations of its training data.

By giving the model the capacity to fill latent space with patterned content it helps drive the models ability for more cohesive generation with larger, higher fidelity structures.

Prompt: hyperdetailed, hyperrealistic, masterpiece professional RAW photograph of Australian bushfires

_MOHAWK_ v2.0:

ZavyChromaXL v6.0:

Unstable Diffusers - Yamer Mix - NihilMania:

Proteus v0.4beta:

Aventis Horizon v2:

Prompt: sharp ornate spiritualisation of transient light across the cosmos, magnificent brilliance

Animagine XL v3.1:

EpiCRealismXL V5-Ultimate:

ZavyChromaXL v6.0:

Unstable Diffusers - Yamer Mix - NihilMania:

Aventis Horizon v2:

Unexpected Benefits and Side Effects

The advancements in Aventis Horizon's comprehension capabilities have come with unexpected benefits and side effects that extend beyond its primary function. Notably, the model's enhanced ability to understand and interpret prompts has led to increased operational efficiency and the generation of more cohesive images at lower configuration settings and with fewer processing steps. However, these improvements come with their unique set of challenges, particularly concerning the expression of repetitive patterns.

Increased Operational Efficiency: The enhanced capacity for comprehension enables it to operate more swiftly and effectively than other models, especially at lower CFG and reduced step counts.
(1024 x 1024px, CFG 1, 16 Steps)
Prompt: A magnificent female sailor, cartoon 3d render, full garb, anime kneeling, detailed forest scenery
Animagine XL v3.1:
Proteus v0.4beta:
Unstable Diffusers - Yamer Mix - NihilMania:
_MOHAWK_ v2.0:
Aventis Horizon v2:
Challenge of Repetitive Pattern Expression: Enhanced pattern training comes with its own set of challenges, notably in the expression of repetitive patterns. While Aventis Horizon excels in generating detailed and complex images, it can sometimes manifest a higher propensity for repeating certain patterns or motifs within parts of the image. This side effect is a direct consequence of the model's deepened pattern comprehension, requiring careful management to ensure diversity and naturalness in the visual output.

The dual nature of these developments, combining operational efficiency and enhanced image cohesion with the challenge of managing repetitive patterns, illustrates the complex interplay between innovation and its implications in the field of AI-generated imagery. As Aventis Horizon continues to evolve, addressing these challenges will be crucial for maximizing its potential and pushing the boundaries of what is possible in AI-driven creative expression.

Conclusion

In Aventis Horizon, the implementation of tech merges and model merging is carefully calibrated to ensure that the enhanced model not only excels in generating high-resolution images but also maintains a high degree of emotional and contextual coherence. This is achieved through a meticulous process where specific LoRAs are developed and integrated to refine the model’s ability in key areas such as style refinement, emotional coherence, and pattern development.

Moreover, the process involves the strategic selection of complementary SDXL models for merging, focusing on those that bring distinct advantages to the table, such as improved general coherence or enhanced capabilities in handling mixed media and complex repetitive structures. The result is a model that stands at the cutting edge of AI image generation, capable of producing work that pushes the boundaries of creativity and resolution.

Aventis Horizon v2 sets a new standard for what is achievable in AI-driven art and design, offering unprecedented opportunities for creativity and innovation in the field.

SDXL: Can it play in 14k? A demonstration of the capacity for tech merges in checkpoint development.

Introduction

Tech Merges: Enhancing Model Capability through Collaboration

The Challenge of Exceeding Training Resolutions

Tiling as a Coping Mechanism

The Importance of Pattern Comprehension Development

Unexpected Benefits and Side Effects

Conclusion

Comments