For those that don’t understand or haven’t learned about machine learning and image generation yet or are new to Stable Diffusion and Civitai – welcome! I’ll try to explain quickly, but you can also check out the DreamBooth paper on Github (https://dreambooth.github.io)
Google did a lot of work with DreamBooth to create a system that can synthesize images into text and then take that text to create new images. This has created a massive interest and ease to using AI to create new images from text and phrases. The AI will use that data to create based on your prompts. The example used is usually a cat or dog, but be descriptive! There are lots of FAQs and Tutorials out there on how to use SD – no matter the generation or design of the AI.
As an experiment, I decided to start running song Lyrics through Stable Diffusion 1.5. The results have been rather interesting so see based on the AI’s ability to gather keywords and create scenes.
Each model has been taught different keywords and phrases when created. The keywords used to train the models all vary based on the intent of the trainer – whether its going to be a character generator or NSFW for example – to create the models. What works for ArtsHP (https://civitai.com/models/108543/artshp) won’t be the same for DreamShaper (https://civitai.com/models/4384/dreamshaper) or Anything V3 (https://civitai.com/models/66/anything-v3) for example. I have been using ArtsHP for this experiment so far and plan to expand to other models in the near future.
I started by using the Rolling Stone’s “Sympathy for the Devil”, the Eagles’ “Witchy Woman”, and Steve Miller Band’s “Abracadabra.” They all took keywords from the lyrics and generated the images from them. So even though they are all using the same dataset of text and images, they all created a different image based on the lyrics. The AI pulls different keywords out of the songs based on positioning of the words in the song. I have yet to get a generation of “Sympathy for the Devil” that does not have a military or royal subject.
When you switch to “Witchy Woman” it does cliché a bit. It takes the chorus and generates a witch in black, using the descriptive lines about her appearance: “Raven hair and ruby lips.”
Each song has its own “theme” that the AI is creating based on the models used. Now while not a very scientific experiment, it is pretty interesting to see what else gets created from these lyrics. You still have to go through and fine-tune the writing. Of hundreds of generations, I usually only keep maybe five to ten images. If you are more artistically inclined or competent at graphic design you can take them and edit them without having to use inpainting, but that’s a whole other topic to cover.
Part of this was to share some of the experiment and see what else is created. Remember though to look through and make syntax edits as needed to ensure the AI reads the lyrics, not the pentameter or punctuations. Garbage in equals garbage out.
At the bottom I’ve popped a few generated images from other songs. Can you guess what the AI created without using PNG Info to trace back the song? Some of these are obvious, some are very interesting. Some of the song lyrics used were:
“Sympathy for the Devil” – The Rolling Stones
“Witchy Woman” – The Eagles
“Abracadabra” – The Steve Miller Band
“The Chain” – Fleetwood Mac
“Who are you?” – The Who
“Baba O’Riley” – The Who
“Rain on the Scarecrow” – John Mellencamp
“Every Rose” – Brett Michaels
“Poison” – Alice Cooper
“Fortunate Son” – Creedence Clearwater Revival
“Heroes” – David Bowie