Sign In

Synthesizing Training Data with ChatGPT

33
Synthesizing Training Data with ChatGPT

I was able to create a model for a concept that had almost no viable training data available, and all of the training data was created specifically for the purpose of training it using ChatGPT and low resolution screenshots.

TL;DR

  • If you don't have enough viable training data but examples (too low resolution) and the concept is simple, you can try to teach a GPT to create the training data for you.

  • If you lack diversity in the training data you can have a GPT remake an image in a different style to add it to your dataset (for example: for a dataset that only has realistic images, you can have it create variations that are paintings/sketches of an example image).

Here's how I did it:


Nebulous Hundun

I've never played Teamfight Tactics, I'm probably not going to ever either. I was watching someone play it and I saw this little guy running around on the screen.

I asked him what it was.

It's called a Nebulous Hundun.

I wanted to make a model of it.


So I searched for images of it and there's basically just one viable image of it online for training data. I thought about asking him to zoom in on it and take a bunch of screenshots of it. At the time, I hadn't used Illustrious XL and also know that it might be viable to use a dataset like this, so long as I tagged them all with "3D". So I gave up on the idea.


As of the time of writing (04/03/25) (03/04/25 if you're European), ChatGPT got an update to it's image generation, using their model whereas it used to use Dall-E. I got the idea to try to see if I would be able to get GPT to create images I could use as training data, I figured it was probably possible because the character's design is relatively simple:


It's first attempt was close, I felt like I could get it there, just a little more adjustment to the shape and not depicting the mouth unless it's open.

I looked up a video showcasing the Nebulous Hundun, these images I gave it are extremely low resolution, I just screenshotted them from a Youtube video. It was also at this time I realized that he has six legs (which I'll get into later).

I reviewed it's understanding and I was satisfied.

Each attempt was getting closer to it, I just kept correcting it and making sure it understood.

I felt like this one was 90% there.

This is when I realized that depicting 6 legs was going to be troublesome, ChatGPT does have a feature that allows you to "inpaint" a selected region, but it has it's limitations, oftentimes, it will affect the composition of the rest of the image, and it doesn't give you nearly as much control as if you were doing it within Forge or Automatic1111.

There were several more instances of me going back and forth not shown here before it really nailed the concept. I gave up on the 6 legs, it wasn't really able to get that right on the first try, and the more I focused on it the more the other aspects began to degrade. If I was truly dedicated to recreating glorified cursor skin that most people don't know about, I probably would have just gave up on the 6 legs here, then taken it into MS Paint, copy and past extra legs where there are supposed to be, then brought it into Forge to inpaint my MS-Paint-back-alley-surgery-job-because-I-don't-know-how-to-use-Photoshop into blending together for the entire dataset. However, I was not that committed, so I just settled for four legs.

Every once in awhile, it would "forget" certain traits, but it was pretty easy to get back on track and I was able to develop a usable dataset.

The next try got it back on track.

I then told it to come up with a list of scenarios we could put it in and art styles we haven't done yet, then has it go down the list of these combining a scenario with an art style one-by-one so I could get diversity in it's context as well as the art style


Here are some of the examples:

I worked in some specific requests as well


Now that it seemed to have the idea locked in, I wonder if I could take it to realism, which I figured would be one of harder things I could try to do (besides get 6 legs consistently).

I think it nailed it first try.

Another one.


I've only done this one time, but I think success depends on several factors

  • if the character design is relatively simple / has characteristics that it it is familiar with

  • if it has "wrong" features - I think that it struggles with 6 legs because it "knows" that a furry creature with 6 legs is "wrong"

I've done some light testing with recreating a fictional human face, giving it an image I generated for one of my Project Odyssey submissions to see if I could create a dataset for a character. This did not work nearly as well as what I've shown so far. I haven't gone that far with it yet, but it is able to understand that it did not reproduce the exact likeness when I started chastising and berating it for not doing it correctly and understands what exactly the differences are, but recreating a photorealistic facial structure is not something I've been able to do with it yet.


Realistic CivChan

I wanted to do some more testing on a new concept, it began as using GPT to diversify a dataset, but then I realized this is something different. I realized that an anime depiction doesn't have a clear path to a photorealistic interpretation. This ended up becoming more of my guess of what a photorealistic CivChan would look like (in terms of facial structure).

This is what happens when you haven't been on top of scolding and berating GPT lately.

The first attempt. You'll see later on that the facial structure is all over the place, but I ended up giving it a general idea of how her face should be to have some kind of visual similarity between different depictions.

I was still being way to nice to it here.

I just stuck with this interpretation of it's general understanding.

Is this more accurate as a realistic interpretation? I don't know.

At this point, I wouldn't make the claim it all looks like the same person, but it became a lot more consistent that what I was getting before (not all of which was pictured previously).

I would say this is not a 100% accurate representation of that facial expression, but I liked it so I gave it a pass.

Not exactly an original idea, but I begin telling it to create images without a strict reference, then I asked for it in a different art style.

When it does a good job, I will say so. When I berate and belittle it later on for messing up, it will be more meaningful then. Although during this chat I was too lazy to do that properly. Do as I say, not as I do. This one would have been perfect if it actually made sense.

Spoiler: it did not, in fact, make those changes to match my vision

So I gave it a reference image to base it off of and I knew it was going to put random people in the back so I tried to pre-empt that. I didn't necessarily want the top hat, but once I saw it, I thought it was hilarious.

And this is where I'll end it, with what is truly one of the artworks of all time.


Edit: since I'm back here to fix some of the images, I'm going to shill the short video I smashed together for the CivChan contest just in time, but then completely failed on actually submitting it and only realized the day after lmao:

https://civitai.com/images/67547292

33

Comments