Mistral for Comfy/Ernie
I thought I would share a few things I learned when training Mistral3 3B.
When full finetuning 1e-5 will cause the model to over train very quickly
Targeting layer 0 and 1 you might need to lower the LR to 1e-6 or freeze those layers
You can train on Image/Caption Pairs, Standard Question/Answer format or just token prediction.
The model is a jack of all trades and for the 3B I have managed to get an ok NSFW caption model that rarely hallucinates but I do not think it is superior to purpose built models of the same size.
I have not decided if I will continue with 3B for a caption software purpose or just move on. The NSFW training and logic puzzles did improve its use with ERNIE
I was very careful with my finetune to not effect logic in ERNIE



