Sign In

T-ponynai3

5.1k
67k
1.9k
Updated: Jun 5, 2024
base modelanimenaiponynai3
Verified:
SafeTensor
Type
Checkpoint Trained
Stats
14,870
Reviews
Published
May 27, 2024
Base Model
Pony
Hash
AutoV2
AC17F32D24
Bronze Base model Badge
Tonade's Avatar
Tonade

模型已经内置vae了,不需要额外添加vae

The model already has included vae, there is no need to add additional vae

最佳的出图策略是适中的分辨率开高清修复,而不是直接使用大分辨率直出

The best generate strategy is to use high-fix at a moderate resolution, rather than directly using high-resolution direct output

[未认证]Tonade正在创作T-ponynai3模型作者,c站id:Tonade, | 爱发电 (afdian.net)

这里是爱发电的赞助通道,觉得模型好用且有余力的话可以支持一下!万不要勉强,感谢你们的每一份支持,会继续探索怎么把模型练好的!

(Because the model can only exist on both Tusi and Tensor simultaneously, it is better to use it in Tusi. If there are any issues with its use, please point them out more to me)

v5版本新加了4个style,可以通过style_1到style_4来微调画面细节(理论上是这样的,实际效果比较玄学)

V5 version has added 4 new styles, which can be used to fine tune the details of the image through style_1 to style_4 (theoretically, this is the case, but the actual effect is more mystical or lower)

本模型完美支持由ponyv6为底模训练的模型,ani3,sdxl1.0的lora也能在某种程度上适配

This model perfectly supports lora trained with ponyv6 as the base model, and the Lora of ani3 and sdxl1.0 can also be adapted to some extent.

基于v4.1的图生图测试(这是在之前版本里被忽略的部分)

Image inpaint testing based on v4.1 (this is a previously overlooked part)

pony是神,兼容性满分。本模型支持ani,pony的lora

必备前置效果词和ponydiffusion一样

positive:(score_9,score_8_up,score_7_up,score_6_up,score_5_up,score_4_up)

OR (score_9,score_8_up,score_7_up)

负面可加:

negative: (score_4,score_3,score_2,score_1),

也可以加正常的nai系负面词,例如:

negative: worst quality, bad hands, bad feet

hope u like it ᕕ(◠ڼ◠)ᕗ base on nai3 and ponyv6

训练须知:v1使用了94张,v2用了119张,v3用了348张,v3.5用了474张,nai3生成的图片,训练的lora融进底模进行微调,pony支持的画师tag都支持,使用两个以上的画师tag可能会导致背景崩溃,目前发现能生成原神的角色,其他的不知道了,对于这个模型我测试的也不多,惊叹于其对于nai3的画风复刻中。底模是T-anime-xl和ponyv6以及ani3的融合模型,并未发布。

使用的训练显卡是我自己的3090显卡,v1到v3分别使用了7小时,12小时,35小时,47小时

Training Instructions:Merge Lora used 94 pictures for v1, 119 pics for v2, 348 pics for v3, 474 pics for v3.5,which generated by NAI3 to train into the basemodel for fine-tuning,Pony supports all artist tags which ponyv6 already have, but there is no any addition artist tag from nai3. Using more than two artist tags may cause background crashes,At present, it has been found that characters that can generate Genshin Impact.I don't know the others.I haven't tested much for this model.,Marvel at its reproduction of the painting style of NAI3.The base model is a fusion model of T-anime-xl and ponyv6 and animage3, which has not been released

The training graphics card I used was my own 3090 graphics card, which was used for 7 hours, 12 hours, and 35 hours and 47 hours from v1 to v3.5, respectively.

v1

一次有趣的尝试

An interesting attempt

v2

在v1的基础上略微增加了训练集,并经历了30小时左右的参数试错,但是训练出来的画风仍然具有一些过拟合,例如双重肚脐眼以及杂乱的头发

On the basis of v1, the training set was slightly increased and went through about 30 hours of trial and error, but the trained art style still had some overfitting, such as double navel eyes and messy hair

v3

v3的肢体会比v2的更好,对于footfocus的理解v3可以生成视觉冲击更大的脚,也更高难度的透视视角,v3的头发的ai感也比v2要弱,因为v2的训练集太少了,所以头发部分会有些过拟合,而且v2偶尔会出现的双肚脐眼也没了。总体来说三倍于v2训练集的大小以及更大的dim参数使得画风拟合的更加自然,而且在长prompt下的表现力远远强于v2。

The limbs of v3 are better than those of v2. In terms of understanding footfocus, v3 can generate feet with greater visual impact and higher difficulty perspective. The AI feeling of v3's hair is also weaker than that of v2, because v2 has too little training set, so the hair part may be slightly overfitting, and the occasional double navel eyes that appear in v2 are also gone. Overall, three times the size of the v2 training set and a larger dim parameter make the art style fit more natural, and the performance is much stronger than v2 under long prompts.

v3.5

在这个版本中,对于质量词的要求并不那么严谨,可以完全不用pony的美学评分的质量词去出图,在测试中偶尔会出现图片生成无意义的色块的情况,只需要将美学评分的质量词换成1.5常用的质量词就行,例如score_1,score_2换成worst quality。这个版本我多加入了150张左右的训练集来平衡以及充实画风,并且减少了学习曲线的初始斜率,这让这个模型没有那么过拟合,可以适配更多的lora以及奇思妙想的提示词。总体来说,这个版本是相较v3版本更加自由的一个版本,并且这个版本对于男性的刻画要远强于v3,在某些提示词下的色彩以及画风没有那么过分鲜艳以及油腻。

In this version, the requirements for quality words are not so strict, you can completely not to use the quality words of pony's aesthetic score to plot the picture, and occasionally there will be a situation where the picture generates meaningless color blocks in the test, you only need to replace the quality words of the aesthetic score with 1.5 commonly used quality words, such as score_1, score_2 replace it with worst quality. In this version, I added about 150 more training sets to balance and enrich the art style, and reduced the initial slope of the learning curve, which makes this model less overfitted and can be adapted to more lora and whimsical prompts. Overall, this version is a freer version than the v3 version, and this version is much stronger than the v3 version, and the colors and style of painting under some hints are not so bright and greasy.

v4

这个版本使用了798张图片作为训练素材,并使用3090显卡训练了90个小时。这个版本相较于v3.5在某些prompt下的构图以及对于某些部位的刻画更加正确,比如手指部分的重影以及一些身体部位的重叠。在prompt方面,我还是以中等长度以及稍短长度的prompt作为主要的训练目标,毕竟谁都不喜欢写一长串的prompt才能生成质量好的图是吧?在去掉pony的美学得分的质量prompt后,图像的质量相比v3.5有很高的提升,出图的质量会偏向于更加平面而非立体的画面,更加接近于经典的动漫画风。对于图片数量对于ponyv6的微调效果的测试接近尾声,下一步就是从prompt的训练标签入手,尝试在pony有限的单次训练素材数量里面添加更多可调控的prompt(例如将美学评分加入进去,现在的训练逻辑还是用主流的质量词去覆盖掉pony的美学得分质量词),并且会持续增添合适的新训练素材,例如场景的训练素材以及更多的足部训练素材(v4的足部训练素材似乎有点少了)。

This version used 798 images as training materials and trained for 90 hours using a 3090 graphics card. This version has a more accurate composition and depiction of certain parts in certain prompts compared to v3.5, such as ghosting of fingers and overlapping of some body parts. In terms of prompts, my main training goal is to use medium and slightly shorter prompts, as nobody likes to write a long string of prompts to generate high-quality images, right? After removing the quality prompt of Pony's aesthetic score, the image quality has been significantly improved compared to v3.5, and the resulting quality tends to be more flat rather than three-dimensional, closer to the classic anime style. The testing of the fine-tuning effect of Ponyv6 on the number of images is nearing completion. The next step is to start with the training labels of prompts and try to add more adjustable prompts to Pony's limited number of single training materials (such as adding aesthetic scores, the current training logic still uses mainstream quality words to cover Pony's aesthetic score quality words), and continue to add suitable new training materials, such as scene training materials and more foot training materials (v4's foot training materials seem to be a bit scarce).

v4.1

向各位用户道歉在这么短时间内又出了个新版本,这十分考验电脑的内存以及网络速度。O_O

Firstly, I would like to apologize to all users for the release of a new version in such a short period of time, which greatly tests the computer's memory and network speed. O_O

这个新版本是基于v4的肢体调试版本,由于v4的肢体效果实在有些难以控制,手部的完美率在经过测试也没有达到我这几天的测试预期。所以我和我的朋友木猫猫猫对v4进行了一些调整以及改善,最终使得v4.1的肢体达到了我的预期,我将会放出几个xy图来清晰的展现v4.1在相同参数下生成图片对比v4的改善程度。

This new version is based on the limb debugging version of v4. Due to the difficulty in controlling the limb effects of v4, the perfection rate of the hands did not meet my testing expectations in the past few days. So my friend 木猫猫猫 and I made some adjustments and improvements to v4, which ultimately made the limbs of v4.1 meet my expectations. I will release several xy graphs to clearly show the improvement of v4.1 compared to v4 under the same parameters.

v5

这个版本训练素材减少了,由于v4的失利,所以我开展了另一个项目,以来从一个小的显存占用的角度来测试我的想法,就是训练了四个不同画风的适配于T-ponynai3的画风lora,当然原始模型也上传了civitai。在测试完适配性后,我便开始将这四组不同的画风作为添加剂训练进T-ponynai3-v5。令人惊讶的是,v5的线条质感好了一个大档次,应该是我训练了一个很细腻的素材的缘故,对于这四个画风的打标,我使用了style_1到style_4的提示词,遗憾的是,不知为何,这四个画风并没有各自分开,或者是效果微弱,反而很好的融合进了原始画风。尽管没有达到支持多个画风的目的,但是很好的将原始的nai3画风的质感提升了一个档次,或许下个版本可以尝试更进一步。(我很喜欢打游戏,每次训练的时候不能玩电脑游戏对我来说太难了)

The training materials for this version have been reduced. Due to the failure of v4, I launched another project to test my idea from a small perspective of memory usage, which is to train four different art styles of Lora adapted to T-ponynai3. Of course, the original model was also uploaded to Civitai. After testing the adaptability, I started training these four different art styles as additives into T-ponynai3-v5. Surprisingly, The line texture of v5 has improved to a high level, probably because I trained a very delicate material. For the marking of these four art styles, I used the prompt words from style_1 to style_4. Unfortunately, for some reason, these four art styles were not separated or the effect was weak, but rather integrated well into the original art style. Although it did not achieve the goal of supporting multiple art styles, it effectively elevated the texture of the original Nai3 art style to a higher level. Perhaps the next version can try to take it even further. (I really enjoy playing games, and it's too difficult for me to play computer games every time I train.)

对于v5版本的一些问题进行一些总结。

1,lora兼容性和肢体以及模糊的眼睛问题。lora兼容性是我对于这次训练的最终权重使用的有些过高了,某些情况下会出现过拟合。这个优化版本便是降低了相应权重的版本,肢体的崩坏率以及对于一些lora的兼容应该会好一些,对此我跑了几张使用基于v4.1训练的画风lora的对比图以供参考。眼睛模糊的问题应该是我训练了style_1的原因,使用的原始素材的眼睛就是模糊的,可以通过使用style_3或4来改善。

2,体积光的曝光问题。我在测试的时候没有出现过这个问题,导致出现的原因应该是我使用了noise offset的训练参数,使得这个模型对于光的相关提示词的敏感度提升,导致原来一样权重的光的提示词导致的结果会更加明亮,我建议可以尝试不使用括号和数字来提升权重,由于sdxl对于提示词的敏感度,可以试试重复多次一样的提示词,这样不容易导致极端结果。同时,使用这个参数是为了修复少量提示词下的生成结果发黄的问题,对此我跑了几张对比图,以供参考。

3,模型的复杂度变低的问题。理论上以及实测来说。v5应该是比之前版本更加干净的和多元的模型,在一些提示词的作用下应该能获得更加精准的表现力,同样的我跑了几张对比图来做比较。这次的训练集并没有采用一些过分复杂的素材,因为我觉得过分复杂的图像会使结果趋于过拟合,这一定导致了不少细节的缺失。

目的: 我希望能够得到一个和之前版本有足够大差别的模型,而不是发布一个和之前几乎一摸一样的模型。这次各位的反馈是很好的试错机会,只靠我自己确实没有什么试错成本。在下个版本中我会尝试提高各个不同画风的素材量,使得不同素材的画风能够很好的融合在一起,并且能够实现画风的分离,用特定的prompt来切换画风,这或许需要一些新的训练技巧。感谢各位的反馈!

Summarize some issues regarding the v5 version.

1, Lora compatibility and issues with limbs and blurred eyes. Lora compatibility is that I used too much final weight for this training, and in some cases, overfitting may occur. This optimized version is the one that reduces the corresponding weight, and the limb collapse rate and compatibility with some Loras should be better. I have run several comparison charts of Loras trained with v4.1 for reference. The problem of blurred eyes should be the reason why I trained style_1. The eyes in the original material used are blurry, and can be improved by using style_3 or 4.

2. Exposure issues with volume light. I did not encounter this issue during testing, and the reason for it should be that I used the noise offset training parameter to increase the sensitivity of the model to light related prompt words, resulting in brighter results when the same weight of light prompt words were used. I suggest trying not to use parentheses and numbers to increase the weight. Due to the sensitivity of sdxl to prompt words, you can try repeating the same prompt words multiple times to avoid extreme results. At the same time, using this parameter is to fix the problem of generating yellow results under a small number of prompt words. I have run several comparison graphs for reference.

3. The problem of reduced model complexity. In theory and in practice. V5 should be a cleaner and more diverse model than the previous version, and with the help of some prompts, it should be able to achieve more accurate performance. Similarly, I ran several comparison charts for comparison. This training set did not use overly complex materials because I believe that overly complex images tend to overfit the results, which inevitably leads to a certain degree of detail loss.

Purpose: I hope to obtain a model that is significantly different from the previous version, rather than releasing a model that is almost identical to the previous version. This feedback from everyone is a great opportunity for trial and error, and I really don't have any trial and error costs on my own. In the next version, I will try to increase the amount of materials for different art styles, so that the art styles of different materials can be well integrated and separated. Using specific prompts to switch art styles may require some new training techniques. Thank you for your feedback!