TheAlly's 100% Beginner Guide to Getting Started in Generative AI Art

Name: TheAlly's 100% Beginner Guide to Getting Started in Generative AI Art
Rating: 5 (443 reviews)
Author: theally

477

1.6k

Updated: Oct 30, 2023

civitai tutorial model helper controlnet

Download (1.69 MB)

Verified: a year ago

Other

Details

Type	Other
Stats	1,619
Reviews	Very Positive (477)
Uploaded	Mar 22, 2023
Base Model	Other
Hash	AutoV2 E1D7A58720

1 File

About this version

theally

Hey!

This Guide & The Author

I'm TheAlly! You might have seen my content around here - I produce and host a diverse range of stuff to help boost your image creation capabilities. I've released some of the most popular content on Civitai, and am constantly pushing the boundaries with experimental and unusual projects.

Me!

This guide is aimed at the complete beginner - someone who is possibly computer-savvy, with an interest in AI art, but doesn’t know where to look to get started, or is overwhelmed by the jargon and huge number of conflicting sources.

This guide is not going to cover exactly how to start making images - but it will give you an overview of some key points you need to know, or consider, plus information to help you take the first steps of your AI art journey.

Generative AI, & Stable Diffusion

So what is “Generative AI”, and how does Stable Diffusion fit into it? You might have heard the term Generative AI in the media - it’s huge right now; it’s on the news, it’s on the app-stores, Elon Musk is Tweeting about it - it’s beginning to pervade our lives.

Generative AI refers to the use of machine learning algorithms to generate new data that is similar to the data fed into it. This technology has been used in a variety of applications, including art, music, and text generation. The goal of generative AI is to allow machines to create something new and unique, rather than simply replicating existing data.

Stable Diffusion is one example of generative AI that has gained popularity in the art world, allowing artists to create unique and complex art pieces by entering text “prompts”.
GPT-3/4 (Chat GPT) is another example of generative AI - a language model that can generate human-like text. It is capable of completing sentences, paragraphs, and even entire articles, given a short prompt. This technology is being used in a variety of applications, including chatbots, content creation, and even computer programming. I used it to write this paragraph in ~1 second.

This guide will specifically cover Stable Diffusion, but will touch on other Generative AI art services.

The Basics

In mid-2022, the art world was taken by storm with the launch of several AI-powered art services, including Midjourney, Dall-E, and Stable Diffusion. These services and tools utilize cutting-edge machine learning technology to create unique and innovative art that challenge traditional forms and blur the lines between human and machine creation.

The impact of AI art on the industry has already been significant. Many artists and enthusiasts are exploring the possibilities of this new medium, while many fear the repercussions for established artists' careers. Many art portfolio websites have developed new policies that prohibit the display of AI-generated work. Some websites require artists to disclose if their work was created using AI, and others have even implemented software that can detect AI-generated art.

The Companies

There are many big-players in the AI art world - here are a few names you'll often see mentioned;

OpenAI - A research laboratory with both for and non-profit subsiduaries, focusing on the development of AI, in an open and responsible manner. Founded by technology investors (including Peter Thiel and Elon Musk) in 2015, OpenAI has created some highly advanced generative AI models, such as GPT-3, and the recently announced GPT-4, which are highly regarded for their language processing and generation abilities.
Stability AI - The world’s leading open source generative AI company - the brainchild of CEO Emad Mostaque, Stability AI is a technology start-up, focused on open source releases of tools, models, and resources. Stability AI is behind the 2022 releases of the Stable Diffusion, and Stable Diffusion 2.0 text-to-image models.
RunwayML - One of the companies behind Stable Diffusion, RunwayML now provide a platform for artists to use machine learning tools in intuitive ways without any coding experience.

Controversies

There are already a number of lawsuits challenging various aspects of the technology. Microsoft, GitHub and OpenAI are currently facing a class-action lawsuit, while Midjourney and Stability AI are facing a lawsuit alleging they infringed upon the rights of artists in the creation of their products.

Whatever the outcome, Generative AI is here to stay.

How does Stable Diffusion Work?

That is an incredibly complex topic, and we’ll just touch on it very briefly here at a very very high level;

(Forward) Diffusion is the process of slowly adding random pixels (noise) to an image until it no longer resembles the original image, and is 100% noise - we’ve diffused, or diluted, the original image. By reversing that process, we can reproduce something similar to the original image. There is obviously a lot more going on in the process, but that’s the general idea. We input text, the “model” processes that text, generates it from the “diffused” image, and displays an appropriate output image.

Simple! (because that's not really what's happening, don't @ me - I know)

How can I make Stable Diffusion Images?

There are a number of tools to generate AI art images, some more involved and complex to set up than others. The easiest method is to use a web-based image generation service, where the code and hardware requirements are taken care of for you but there’s often a fee involved.

Alternatively, if you have the required hardware (ideally an NVIDIA graphics card), you can create images locally, on your own PC, with no restriction, using Stable Diffusion.

When we talk about Stable Diffusion, we’re talking about the underlying mathematical/neural network framework which actually generates the images. We need some way to interface with that framework in a user-friendly way - that’s where the following tools come in;

To run on your own PC - Local Interfaces

This guide is extremely high level and won’t get into the deep technical aspects of installing (or using) any of these applications (I will be posting an extremely in-depth guide at a later date), but if you’d like to run Stable Diffusion on your own PC there are options!

Note that to get the most out of any local installation of Stable Diffusion you need an NVIDIA graphics card. Images can be generated using your computer’s CPU alone, or on some AMD graphics cards, but the time it will take to generate a single image will be considerable.

Automatic1111’s WebUI (Complexity factor ⭐⭐⭐⭐/5) - WebUI is the most commonly used Interface for Stable Diffusion. It is moderately complex, and has a wide range of plugins and extensions to extend the experience. There’s a great deal of community support available if you have problems.
ComfyUI (Complexity factor ⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐/5) - ComfyUI is relatively new to the scene, and provides an exceedingly complex workflow/node based workspace which requires in-depth knowledge of the Stable Diffusion image generation process to make work. Definitely not a beginner interface, but extremely powerful for the experienced user.
Cmdr2’s Easy Diffusion (Complexity factor ⭐⭐/5) - A great option for those starting out with a local install. Easy Diffusion has a 1-click installer for Windows, and a popular Discord server full of extremely knowledgeable people to help you get up and running. The interface itself is limited in what it can do, compared to the other Interfaces, but it remains the easiest way to get started making your own images, locally.
InvokeAI (Complexity factor ⭐⭐⭐/5) - A popular open-source text-to-image and image-to-image interface with powerful tools, not yet as full featured as Automatic1111’s WebUI, but getting close.

To run on your own Mac - Local Interfaces

Mac owners can run Automatic1111’s WebUI, InvokeAI, and also a popular, lightweight, and super simple to use Interface, DiffusionBee;

DiffusionBee (Complexity factor ⭐/5) - DiffusionBee is an extremely lightweight MacOS interface for Stable Diffusion. It allows for basic image generation, but has a very small feature-set, to keep it as simple as possible.
Draw Things App - (Complexity factor ?/5) - Draw Things is a popular and highly rated MacOS App. I don't know much about it, but from anecdotal evidence it seems to have some good features!

To run via an Image Generation Service

There are many websites appearing which allow you to create Stable Diffusion images if you don’t want the fuss of setting up an interface on your local PC, or if your computer hardware can’t support one of the above interfaces.

Prodia - Prodia is an easy to use interface for Stable Diffusion, with access to a few popular models. Images can be generated here for free without a cap on the number, but advanced features require a paid subscription.
Mage.space - Mage.space is a fully featured interface with a host of advanced settings. Images can be generated for free (with an account), but more in-depth control requires a paid subscription.
Nightcafe - Nightcafe Studio is a popular AI art generator with a large community of followers, offering a range of options for free, or for earnable credits.
Dall-E 2 - One of the first image generator tools, now overtaken a little in terms of functionality and image quality. Users get 15 free generation credits per month.
Midjourney - Not technically a Stable Diffusion implementation - slightly different technology, doing the same thing! Midjourney produces extremely distinctive images and has a huge following.

An example of Midjourney generated artworks.

I now have an interface (or have chosen a Generation Service)! What are “models”?

Checkpoints, also known as “weights” or “models” are part of the brains which produce our images. Each model can produce a different style of image, or a particular theme or subject. Some are “multi-use” and can produce a mix of portrait, realistic, and anime (for example), and others are more focused, only reproducing one particular style of subject.

Models come in two file types. It’s important to know the distinction if running a local Stable Diffusion interface, as there are security implications.

Pickletensor (.ckpt extension) models may contain and execute malicious code when downloaded and used. Many websites, including Civitai, have “pickle scanners” which attempt to scan for malicious content. However, it’s safer to download Safetensor (.safetensor) models when available. This file type cannot contain any malicious code and is inherently safe to download.

Note that if using a Generation Service you will only be able to use the models they provide. Some services provide access to some of the most popular models while others use their own custom models. It depends on the service.

Along with models there are many other files which can extend and enhance the images generated by the models, including LoRA, Textual Inversion, and Hypernetworks. We’ll look at those in a more in-depth guide.

Where do I get models?

Most stable diffusion interfaces come with the default Stable Diffusion models, SD1.4 and/or SD1.5, possibly SD2.1 or SD2.2. These are the Stable Diffusion models from which most other custom models are derived and can produce good images, with the right prompting.

Custom models can be downloaded from the two main model-repositories;

Civitai - You are here! Civitai is the leading model repository for Stable Diffusion checkpoints, and other related tools. There are tens of thousands of models to choose from, across many categories; something for everyone!

Huggingface Model Hub - Huggingface has a wide variety of txt2img models, but finding models you’d like to try is often a challenge, as the interface is not the most user friendly for browsing.

Other Generative AI Services?

Generative AI is a huge field, with many applications. Some of the most popular and interesting tools right now are;

ChatGPT - Mentioned above, ChatGPT is what’s known as an LLM (Large Language Model), designed to provide conversational responses to input text, understand and answer questions, provide recommendations, generate content, and more. It can solve problems, write code - it’s extremely useful, and free (with limitations). The first local models for ChatGPT like LLMs are now appearing, and I will post a tutorial on my Patreon soon, covering their use.
Riffusion - Riffusion generates music from text prompts, rather than images! You can ask for your favorite style - or instrument - or ambient sounds, in any combination or beat, and get some really wonderful outputs. You can run Riffusion from the website, or alternatively, there is a way to run it locally from the Automatic1111 WebUI interface.