Type | |
Stats | 2,387 |
Reviews | (302) |
Published | Aug 6, 2023 |
Base Model | |
Trigger Words | *read notes* |
Hash | AutoV2 D9A25BD6E4 |
Introduction
This is a model focused on more western styles, in particular comics. Some of the main ideas behind the model was to be able to produce multiple different styles and to be able to produce multiple different faces. For this reason, names of artists or people and styles often have a stronger effect than other models that aim at producing a single style. Colors tend to be relatively vibrant in most cases. It can mostly pass the model evaluation tests mentioned here, but the "base style" of the model is an illustration with some painterly effects and realistic proportions.
For the sample images I tried using some prompts used in other model's showcase, no LoRAs and few to no negative prompts, to try to show what the model does. Also, all images were generated using the same seed, so they are not very cherry-picked.
Good to know
It may not be too different from using other models, but in additional to prompts used normally, image tags had two main parts besides the description tags: "style" and "subject".
The style part had a simple structure that looks like:
<epoch> <genre> <medium> <form> by <artist>
Where each component was something like:
epoch: classic, vintage, retro, retro futurism, 40s, 50s, 60s, 70s, 80s, 90s, 2000s, modern
genre: fantasy, urban fantasy, medieval fantasy, asian fantasy, medieval, scifi, cyberpunk, steampunk, dieselpunk, solarpunk, samuraipunk, wizardcore, witchcore, noir, art nouveau, pin-up, post-apocalyptic, futuristic, concept art, grotesque, horror
medium: oil, airbrush, pencil, watercolor, cell shading, gouache, digital art, acrylic, charcoal, pastels, ink, matte, collage, mosaic, encaustic, pixel art, vector art, acuarela
form: comic, cartoon, graphic novel, animation, storybook, impasto, pseudo-impasto, sketch, drawing, illustration, painting, wax, anime, manga, lineart
artist: it's a long list, but some of them are:
comics: Alan Davis, Jay Anacleto, Jim Lee, Mike Deodato, Jean Giraud, Neal Adams, Mike Mignola, Joe Madureira, Mario Alberti, David Finch, Hubert de Givenchy, Todd McFarlane, Stephan Martinire, Pepe Larraz, Paolo Roversi, Patrice Murciano, Pascal Blanche, Frank Miller, Alex Horley, Krenz Cushart, Hollie Mengert, Andy Kubert, Vittorio Giardino, Stanleylau, Raphael Lacoste, Andreas Rocha, James C. Christensen, Alex Ross, Greg Staples, J Scott Campbell, Todd McFarlane, Akiman, James Daly, Bruce Timm
fantasty: Boris Vallejo, Frank Frazetta, Julie Bell, Gerald Brom, Michael Whelan, Keith Parkinson, Tony Sart, Anato Finnstark, Randy Vargas, Diego Gisbert Llorens, Johan Grenier, Bayard Wu, Marc Simonetti, Marc Brunet, Don Bluth, Peter Mohrbacher, Clint Cearley, Magali Villeneuve, Sam Burley, Algenpfleger, JohnoftheNorth, UdonCrew, Yongjae Choi, Shieldmaiden, Wylie Beckert, Jason A. Engle, d1eSELxxxx, Chris Rallis, Stanton Feng, Zezhou, Ed Blinkey, Atey Ghailan, Jeremy Mann, Greg Manchess, Antonio Moro, Dan Mumford, Luis Royo, Viktoria Gavrilenko
horror: Dariusz Zawadzki, H.R. Giger, Anton Semenov
other: Yoshitama Amano, Masamune Shirow, Greg Rutkowski, artgrem, loish, wlop, nixeu, Kuvshinov Ilya, cutesexyrobutts, Anne Bachelier, Yoji Shinkawa, Akihiko Yoshida, Ross Tran, Tsutomu Nihei, Ed Roth, Andrew Wyeth, Wonkeyman, Larry Rivers, Kinu Nishimura, Ayami Kojima, Masashi Kishimoto, Kaethe Butcher, Hajime Sorayama, Greg Tocchini, Virgil Finlay, Alexis Franklin, Kiko Rodriguez, Georgia O'Keeffe, Alberto Seveso, The Rusted Pixel, Yuko Shimizu
A few notes on these:
All the components are optional, they can be added to the prompt as needed.
Some tags are stronger than others. There were more originally, but the effect is too weak or they got mixed with other tags.
The artist styles are not identical to the original artists, but they can help control the direction of the results.
Each of these elements can influence different parts of the images (composition, coloring, medium, style, etc). They can be used to reinforce these parts or change them in a different direction. For example, using the "comic" form with a comic artist will reinforce the style, but using the "impasto" medium with a comic artist will produce a combination. This also means that each one will be more noticeable when prompts are shorter.
The mediums will not necessarily be realistic, since they have been pushed in the direction of comics/fantasy illustrations, but they can help move the results closer to that style.
impasto and pseudo-impasto can help generate more fantasy (less comic) results.
The subject part is based on an extended dataset from the one for "Dungeons and Diffusions" by 0xJustin , including comics, concept art, illustrations, manga and others from multiple different artists. Similar to the style prompt, the subject prompt also can be used with a simple structure of:
<race> <gender> <class>
Where the tags looked like:
race: oni, aasimar, air_genasi, demon, dragonborn, drow, dwarf, earth_genasi, gnome, elf, firbolg, fire_genasi, goblin, goliath, halfling, human, kobold, lizardfolk, orc, tabaxi, tiefling, warforged, water_genasi
class: artificer, bard, barbarian, berserker, black knight, cleric, cyborg, defender, druid, fighter, knight, lancer, mage, monk, ninja, noble, paladin, rogue, samurai, sorcerer, townsperson, valkyrie, warlock, warrior, wizard
You can also try adding a "culture", but they are often overriden by other tags.
culture: Celtic, Nordic, Amazonian, Aztec, Chinese, Japanese, African, Persian, Viking, Indian
The gender option will push results towards humans, so it might be a bit luck-based.
A simple prompt to test would be in the form of:
<subject>,
<view>,
<style>
Where view would be something like upper body or portrait, with no negative prompt or fixes (like Hires. fix) and then build from there. When using Hires. fix, negative prompts might be needed more frequently.
Most of my generation tests were done without Hires. fix, since it takes a long time to upscale even by 1.2 scale, but the images in the showcase were made using it.
Issues
It can't do realistic photograph images or 3D renders (it will very probably look like a realistic painting at best) and for anime it can do the coloring, but the proportions are more difficult to get. Some manga and anime styles were included, but the characteristic large eyes and facial proportions would need some work or external help (ie: LoRAs).
It has some issues with eyes, probably because images with colored sclera and images with small faces were used during training.
It may have a tendency to generate speech bubbles, comic book covers (with logos and text) and other texts in some instances.
How it was made
The way it was trained is a bit convoluted. It started being trained somewhere in December 2022, based on a merge of some models from that time that produced images closer to the desired style and with an extended dataset from the one used for "Dungeons and Diffusions" by 0xJustin, first trying to add more styles and comics and games styles, but it didn't work as well as expected. Then it was made into two different models (fantasy and games/comics), but they also didn't come out great. The main problem was that the different styles kept affecting each other, so it didn't seem to be going too far.
As time passed, more and better models kept coming out and eventually it didn't seem like there was a point to continue with them with the few resources I have. After some tests and manual per-layer merging of the trained models, one combination was able to produce comics results and still understand some of the fantasy concepts (classes, races, etc). This combination was then briefly trained again to try to fix some composition issues and it's the current version of the model. It might not be as good as other popular models, but it may be interesting to try out.