I'm not an expert with many things that I dabble with that have a community of experts on the subject, but I like to view myself as someone who's creative and resourceful enough to give a seasoned pro something that'll catch them off guard. One example is that I'm part of FIMFiction, and while I'm no means a master of words, I've presented stories there that range from a series of text messages to animated childlike drawings that many might not have considered to use when the most likely response is a sentence of stringed words. Here on Civit.ai, my flexibility has become very cuffed with the introduction of the buzz system as every move I make will come at the cost, which only increases the trouble when you consider the fact that this AI stuff is an entirely new field in the world that has plenty left to explore and experiment with. Many combinations based entirely on preestablished images that get prepped, meshed, blended, and decorated like a dish uniquely different from the one you asked seconds for. But there's one kind of "dish" that's been on my mind for a while now. And that dish, is "companion".
Multiple unique characters seem to be a real struggle with these images as they blend together and often make a single blob that's unidentifiable. It's not impossible, but they require a particular methods that feel like a Band-Aid, a temporary solution that doesn't solve the problem. The method is using preassigned locations that split the AI's focus where it won't cross portions of the prompt with one another. It's much like a pair of roommates splitting their quarters apart where they can't enter into the other's territory, and at best, the standard Civit user can use something called a 'BREAK' to accomplish something in a similar fashion to it.
This image was a quick render I made with BREAK in the prompt from using a LoRA made by richyrich515, not 2, 1 LoRA. Using multiple LoRAs designed for different characters can begin to blend features, but this single LoRA manage to make two complex characters that are drastically different in design from one another while staying (mostly) within their official design. With the help of some specific LoRAs, users have gotten these two to interact in more complexing ways and even manage to get them to... "form an alliance" with each other, and while that last bit of display might disprove for what I'm about to talk about, I think the "alliances" can be comprehend by the AI at this point as easily as comprehending "facing viewer".
After making my first LoRA of a pocket sized character, I considered on making the next LoRA be the companion she's usually carried by, Nigel. As I looked through the images for the guy, I kept finding her flying around and said to myself, "Wouldn't it be neat if I had her like a kind of little accessory to have her ride on Nigel like the small nymph that she is?" and as I was going to do that, I realized that there could be a huge problem for it to translate well. Size difference, close up details, and her regular issue of making physical contact with Nigel, they're all going to be a problem if I wanted this LoRA to be about Nigel. I mean, what if I wanted her to ride on his shoulder? I tested it with the same Halo LoRA I used for the image above with the added line "standing on masterchief shoulder" and got this as the best result...
I check a few concepts to see if anyone had manage to pull off something similar to what I wanted. Link and Navi, Peter Pan and Tinkerbell, Misty and Togepi, none of them have tackled this issue outside of the latter (who at best could only get Togepi to be held in a straightaway shot). But as I turned to Banjo & Kazooie and saw them in their separate LoRA, but nothing tying them together, I considered that the complexity of these two characters acting as one would truly be a difficult feat to answer. It then dawned on me of what was tying these two together.
The Backpack Theory is basically a concept idea for getting two individuals to interact while still retaining their forms to their respected design without the usage of a concept LoRA. It goes like this... when training a LoRA that'll be using 2 characters, you pick a trigger word for the first character, another trigger word for the second character, and a third trigger word that'll be used for when you're using an image that'll be getting trigger word one and two.
For example, Banjo is trigger 1, Kazooie is trigger 2, and backpack is trigger 3. If the first image only has Banjo, it gets 1. Second image is Kazooie, it gets 2. Third image has Banjo & Kazooie, it gets 1, 2, and 3. If done right, it should theoretically keep the characters aligned to the way they appear without any issue of wither you only want Banjo, Kazooie, both, or even separated (just put the appropriate choice in the negative prompt). You could even replace the word "backpack" with "Banoie" if you want Banjo to keep that backpack of his.
This method might be possible with LoRAs that associate with two characters in them, but as the thread implies, this is a theory. Spending 500 buzz points and plentiful amount of hours assigning tags and arranging shots just to test something that might fail is a hefty risk, so do think carefully before attempting this as I'm still new to this stuff and take regular gambles on getting desired images. But if this works, it could be expanded upon to make it possible for more than two characters to properly be in the same scene. There's quite a few fight scenes, goofy scenarios, and endearing moments that would be possible if we had a third word to play as the backpack that ties the characters together.