Sign In
Evolution of Text-to-Image: 2022

Go to year: 202220232024 (pt 1)2024 (pt 2)


Introduction

I've been following the progress of text-to-image since before models became available to the public. I've tried most of what has been available as it came out. It's amazing how rapidly the quality has improved over only a few years. I got the idea for this project because I couldn't find a convenient timeline showing comparative examples of how images have advanced. So I decided to make it myself and share it with you.

This series of articles will show a grid of images produced by each model. They will be listed chronologically and organized in articles based on the year.

Prompts and project details can be found at the bottom of the article. High resolution versions of the comparison image grids are in this article's attachment.

The Models

Before 2022

The first text-to-image models available to the public that I know of were the CLIP models. Some of these are still available for download as command line generators or on Google Colab. These had names like Deep Daze, Big Sleep, Aphantasia, Disco Diffusion, and Jax Guided Diffusion. A couple of the earliest CLIP models that are still easily accessible online are VQGAN+CLIP (Artistic) and CLIP-Guided Diffusion (Coherent) on NightCafe.

There aren't enough 2021 models to warrant their own article, but I'd like to show a couple examples so you can see where we started.

VQGAN+CLIP

April 2021

This was originally available in Katherine Crowson's Google Colab Notebook. It was one of the first text-to-image models made available for the public to experiment with. This is known on NightCafe as the "Artistic Model".

✅free download (github)

CLIP Guided Diffusion

June 2021

Another model by Katherine Crowson. This was the first diffusion model that worked with CLIP. It's known as the "Coherent Model" on NightCafe.

✅free download (github)

Midjourney 1

Febuary 2022

We kick 2022 off with the first version of Midjourney, a subscription text-to-image service only available on Discord.

❌free download

DALL-E 2

April 2022

A couple months later OpenAI released the first version of DALL-E open to the public. It used a paid credit system.

❌free download

DALL-E Mini (Craiyon)

April 2022

This was an attempt at making a freely available text-to-image generator similar to DALL-E. Although this model took inspiration from the same model as DALL-E, it had a different creator. It was later renamed to Craiyon to prevent confusion. It's known for producing images that can be bizarre and funny.

❌free download

Midjourney 2

April 2022

Midjourney released their first version update in April 2022.

❌free download

Midjourney 3

July 2022

Another Midjourney update offering some minor improvements over the previous version.

❌free download

Stable Diffusion 1.4

August 2022

Stable Diffusion 1.1, 1.2, 1.3, and 1.4 all came in August 2022. Version 1.4 was the first version released to the public.

✅free download (huggingface)

Stable Diffusion 1.5

October 2022

A couple months later we got 1.5. The ease of training this model on new subjects and styles made it the most popular version of Stable Diffusion until XL came along.

✅free download (huggingface)

Midjourney 4

November 2022

Midjourney 4 was a major leap in text-to-image quality. It also doubled Midjourney's generated output from 256x256 to 512x512.

❌free download

Stable Diffusion 2.0

November 2022

Stable Diffusion released a new version trained on 768x768 images. Unfortunately, it was heavily censored and was a less flexible model than 1.5. It never became popular in the community.

✅free download (huggingface)

Stable Diffusion 2.1

December 2022

A month later 2.1 was released, but it was too little too late. Its popularity did not pick up.

✅free download (huggingface)

Midjourney Niji 4

December 2022

To close out the year, we have Midjourney's first Niji model. It was a collaboration between Midjourney and Spellbrush to specialize in anime and illustration styles.

❌free download

Project Details

Disclaimer

I'm not an insider with special access to anything or a programmer who understands how all this works under the hood. I took some time to research, but this is from information found online and I can't guarantee everything is accurate. This is a work in progress; I'm still working on filling in missing information.

Also note that this is only a comparison of base models. Some models can produce significantly better images by using trained checkpoints, styles, presets, or detail enhancers.

Criteria

  • Must still be publicly accessible in 2024 without a complicated setup.

  • For this series, I've excluded turbo/fast versions of the models.

Process

  • I chose 15 prompts that show a variety of photo realism, art styles, people, animals, objects, specific instructions, open-ended short prompts, text, and abstract concepts.

  • All images come from the first generation set and I never picked from more than 1-4 images.

  • When possible, I used images from the same seed which can show differences between minor versions of the same model.

  • I used the recommended settings for each model or the default offered online.

  • I didn't use additional styles or presets.

Prompts

  • african hydropunk princess

  • artificial intelligence

  • astronaut exploring an alien planet

  • overhead view of a breakfast plate with eggs, toast, strawberries, coffee, and a fork

  • exterior of a cafe watercolor painting

  • person wearing cyberpunk accessories in a high tech neon city

  • druid man character design

  • ethereal fairy in the style of oil painting

  • graphic design logo with fennec fox and succulents and text "Desert Design"

  • man and a woman in love

  • photo of a deer in an enchanted forest with cinematic lighting

  • Photo portrait of a woman with long black curly hair in natural light. She's wearing a fashionable purple blouse, a gold necklace with a locket, and hoop earrings. Bokeh background.

  • pixel art city street scene with shops and pedestrians at night

  • red potion bottle with text "health" on the left, blue potion bottle with text "mana" in the middle, green potion bottle with text "poison" on the right, on a wooden table in a dark alchemist's laboratory, in the style of a detailed digital painting

  • woman lying on the grass

Article Updates

  • Nov 25, 2024: I fixed an issue with a wrong image being on the Stable Diffusion 1.5 grid. The attachment has also been updated.

  • Nov 26, 2024: Added navigation links. Added download links.


Go to year: 202220232024 (pt 1)2024 (pt 2)

8